Crash in NS_OutputStreamIsBuffered

RESOLVED FIXED in Firefox 53

Status

()

defect
P2
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mccr8, Assigned: catalinb)

Tracking

(Blocks 1 bug, {crash})

unspecified
mozilla55
Unspecified
Windows 10
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox52 unaffected, firefox-esr52 unaffected, firefox53 fixed, firefox54 fixed, firefox55 fixed)

Details

(crash signature, )

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-7add4b55-5b0e-470b-87ac-e91c22170223.
=============================================================

Null deref crash in NS_OutputStreamIsBuffered. This is not a common crash. I only see a couple of crashes, all from the same install time, all on the site https://una.im/ 

I was able to reproduce this crash by going to the site, then scrolling around, then closing the tab and later going to the page again: bp-c96a72da-f97b-4469-b90a-4f0a72170223

It is called like this, so I guess responseBody is null:
if (!NS_OutputStreamIsBuffered(responseBody)) {

That is set like this earlier in the method:
    nsCOMPtr<nsIOutputStream> responseBody;
    rv = mInterceptedChannel->GetResponseBody(getter_AddRefs(responseBody));
    if (NS_WARN_IF(NS_FAILED(rv))) {
      return;
    }

However, the definition of GetResponseBody seems to be:
InterceptedChannelBase::GetResponseBody(nsIOutputStream** aStream)
{
  NS_IF_ADDREF(*aStream = mResponseBody);
  return NS_OK;
}

which means that we can return null on success.
Reporter

Updated

2 years ago
Catalin, can you take a look?
Flags: needinfo?(catalin.badea392)
Assignee

Comment 2

2 years ago
I can see the crash in nightly as well.
Flags: needinfo?(catalin.badea392)
Assignee: nobody → catalin.badea392
Priority: -- → P2
Assignee

Comment 3

2 years ago
Oh, I thought I assigned myself to the bug.
Assignee

Comment 4

2 years ago
If the SW throws after calling respondWith(), the reset interception runnable
will race with the promise passed to respondWith(). Note, we shouldn't reset
interception on JS errors. ResetInterception will set
mInterceptedChannel->mResponseBody to null, which will cause a crash when
the promise is resolved.

This patches fixes both issues, but the WPT test is a bit of a stretch because
I'm using a setTimeout to force the crash conditions in firefox.
Attachment #8847279 - Flags: review?(bugmail)
I think this bug ends up being a continuation of bug 1271692 in response to https://github.com/w3c/ServiceWorker/issues/896.
Blocks: 1271692
Comment on attachment 8847279 [details] [diff] [review]
Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after respondWith().

Review of attachment 8847279 [details] [diff] [review]:
-----------------------------------------------------------------

Huh, so this race is extra bad because InterceptedChannel{Base,Chrome,Content} aren't quite thread-safe and so the method calls on InterceptedChannel* can interleave.
* ResumeRequest invokes ResetInterception() on the main thread (asserted by ResumeRequest, InterceptedChannel* doesn't care).
* RespondWithHandler::ResolvedCallback invokes InterceptedChannelBase::GetResponseBody() on the worker thread (asserted).
* InterceptedChannel{Base,Content,Chrome} don't use locks/monitors anywhere.

Invariant-wise, the other SW logic that might impact mResponseBody is FinishResponse::Run which (on the main thread, asserted) invokes FinishSynthesizedResponse.  So for the SW logic to be safe, it's required that we ensure calls to ResetInterception() or FinishSynthesizedResponse() can't possibly overlap.  The change to FetchEventRunnable::DispatchFetchEvent takes care of one of the ResumeRequest instantiation sites.  The other is its Cancel() method which is mutually exclusive with WorkerRun() having run.  So that's good.  And FinishSynthesizedResponse happens stricly after RespondWithHandler::ResolvedCallback completes, so that's fine too.

Restating stuff:
- The new test covers :bkelly's second example from https://github.com/w3c/ServiceWorker/issues/896#issue-154045947 which bug 1271692 did not add a test for.

I'm giving you an r+, but I'm not a peer for this code, so you still need to needinfo :bkelly or some other DOM or Workers peer to defer or do their own review.

::: testing/web-platform/tests/service-workers/service-worker/resources/respond-then-throw-worker.js
@@ +1,3 @@
> +self.addEventListener('fetch', function(event) {
> +    event.respondWith(new Promise(function(res) {
> +        setTimeout(function() { res(new Response('intercepted')); }, 300);

As I understand the scheduling issue, without your fix, the throw("error") below would resume in a ResumeRequest being dispatched to the main thread at the conclusion of this event handler.  We want to ensure the resolve(), which will be entirely serviced on the worker thread, will happen after that.

The setTimeout(300) can be replaced by a postMessage ping/pong and a setTimeout delaying the ping or the pong so that the pong happens strictly after the ResumeRequest runnable is dispatched.  I'd probably put the setTimeout with the ping and the comment in here.

It's still an implementation dependent check, but I think a case can be made for it being a likely implementation error.  And most importantly, it covers our crasher!  The comment might go something like this:
Ensure that throwing an error from the fetch handler does not cancel the still-pending respondWith().  We delay resolving the promise by both a setTimeout and a postMessage ping-pong so that any cancellation logic has the opportunity to be propagated to the page and back to the worker.
Attachment #8847279 - Flags: review?(bugmail) → review+
I defer to :asuth to review this bug.  Thanks!

Comment 9

2 years ago
Pushed by catalin.badea392@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ce518ccb466
Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith(). r=asuth

Comment 10

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9ce518ccb466
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Please request Aurora/Beta approval on this when you get a chance.
Flags: needinfo?(catalin.badea392)
Assignee

Updated

2 years ago
Attachment #8847279 - Attachment is obsolete: true
Assignee

Comment 12

2 years ago
Comment on attachment 8849103 [details] [diff] [review]
Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith().


Approval Request Comment
[Feature/Bug causing the regression]: Service Worker null deref
[User impact if declined]: Crash in websites using SW
[Is this code covered by automated tests?]: Yes. WPT test
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: No
[Why is the change risky/not risky?]: SW specific
[String changes made/needed]: none

Note when merging, the changes to the manifest file will probably not apply, just discard them and run ./mach wpt-manifest-update
Flags: needinfo?(catalin.badea392)
Attachment #8849103 - Flags: approval-mozilla-beta?
Attachment #8849103 - Flags: approval-mozilla-aurora?
Comment on attachment 8849103 [details] [diff] [review]
Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith().

Fix a crash related to SW. Aurora54+ & Beta53+.
Attachment #8849103 - Flags: approval-mozilla-beta?
Attachment #8849103 - Flags: approval-mozilla-beta+
Attachment #8849103 - Flags: approval-mozilla-aurora?
Attachment #8849103 - Flags: approval-mozilla-aurora+
(In reply to Cătălin Badea (:catalinb) from comment #12)
> Comment on attachment 8849103 [details] [diff] [review]
> Fix crash in respondWith resolved callback. Don't reset interception if the
> sw throws after .respondWith().
> 
> 
> Approval Request Comment
> [Feature/Bug causing the regression]: Service Worker null deref
> [User impact if declined]: Crash in websites using SW
> [Is this code covered by automated tests?]: Yes. WPT test
> [Has the fix been verified in Nightly?]: Yes.
> [Needs manual test from QE? If yes, steps to reproduce]: No.
> [List of other uplifts needed for the feature/fix]: none
> [Is the change risky?]: No
> [Why is the change risky/not risky?]: SW specific
> [String changes made/needed]: none
> 
> Note when merging, the changes to the manifest file will probably not apply,
> just discard them and run ./mach wpt-manifest-update

yes this happened when trying to uplift this patch. Can you provide a rebased/aurora-beta patch ? Thanks!
Flags: needinfo?(catalin.badea392)
Flags: needinfo?(catalin.badea392)
Setting qe-verify- based on Catalin's assessment on manual testing needs (see Comment 12) and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.