Closed Bug 1342255 Opened 8 years ago Closed 8 years ago

Crash in NS_OutputStreamIsBuffered

Categories

(Core :: DOM: Service Workers, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: mccr8, Assigned: catalinb)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-7add4b55-5b0e-470b-87ac-e91c22170223. ============================================================= Null deref crash in NS_OutputStreamIsBuffered. This is not a common crash. I only see a couple of crashes, all from the same install time, all on the site https://una.im/ I was able to reproduce this crash by going to the site, then scrolling around, then closing the tab and later going to the page again: bp-c96a72da-f97b-4469-b90a-4f0a72170223 It is called like this, so I guess responseBody is null: if (!NS_OutputStreamIsBuffered(responseBody)) { That is set like this earlier in the method: nsCOMPtr<nsIOutputStream> responseBody; rv = mInterceptedChannel->GetResponseBody(getter_AddRefs(responseBody)); if (NS_WARN_IF(NS_FAILED(rv))) { return; } However, the definition of GetResponseBody seems to be: InterceptedChannelBase::GetResponseBody(nsIOutputStream** aStream) { NS_IF_ADDREF(*aStream = mResponseBody); return NS_OK; } which means that we can return null on success.
Catalin, can you take a look?
Flags: needinfo?(catalin.badea392)
I can see the crash in nightly as well.
Flags: needinfo?(catalin.badea392)
Assignee: nobody → catalin.badea392
Priority: -- → P2
Oh, I thought I assigned myself to the bug.
If the SW throws after calling respondWith(), the reset interception runnable will race with the promise passed to respondWith(). Note, we shouldn't reset interception on JS errors. ResetInterception will set mInterceptedChannel->mResponseBody to null, which will cause a crash when the promise is resolved. This patches fixes both issues, but the WPT test is a bit of a stretch because I'm using a setTimeout to force the crash conditions in firefox.
Attachment #8847279 - Flags: review?(bugmail)
I think this bug ends up being a continuation of bug 1271692 in response to https://github.com/w3c/ServiceWorker/issues/896.
Blocks: 1271692
Comment on attachment 8847279 [details] [diff] [review] Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after respondWith(). Review of attachment 8847279 [details] [diff] [review]: ----------------------------------------------------------------- Huh, so this race is extra bad because InterceptedChannel{Base,Chrome,Content} aren't quite thread-safe and so the method calls on InterceptedChannel* can interleave. * ResumeRequest invokes ResetInterception() on the main thread (asserted by ResumeRequest, InterceptedChannel* doesn't care). * RespondWithHandler::ResolvedCallback invokes InterceptedChannelBase::GetResponseBody() on the worker thread (asserted). * InterceptedChannel{Base,Content,Chrome} don't use locks/monitors anywhere. Invariant-wise, the other SW logic that might impact mResponseBody is FinishResponse::Run which (on the main thread, asserted) invokes FinishSynthesizedResponse. So for the SW logic to be safe, it's required that we ensure calls to ResetInterception() or FinishSynthesizedResponse() can't possibly overlap. The change to FetchEventRunnable::DispatchFetchEvent takes care of one of the ResumeRequest instantiation sites. The other is its Cancel() method which is mutually exclusive with WorkerRun() having run. So that's good. And FinishSynthesizedResponse happens stricly after RespondWithHandler::ResolvedCallback completes, so that's fine too. Restating stuff: - The new test covers :bkelly's second example from https://github.com/w3c/ServiceWorker/issues/896#issue-154045947 which bug 1271692 did not add a test for. I'm giving you an r+, but I'm not a peer for this code, so you still need to needinfo :bkelly or some other DOM or Workers peer to defer or do their own review. ::: testing/web-platform/tests/service-workers/service-worker/resources/respond-then-throw-worker.js @@ +1,3 @@ > +self.addEventListener('fetch', function(event) { > + event.respondWith(new Promise(function(res) { > + setTimeout(function() { res(new Response('intercepted')); }, 300); As I understand the scheduling issue, without your fix, the throw("error") below would resume in a ResumeRequest being dispatched to the main thread at the conclusion of this event handler. We want to ensure the resolve(), which will be entirely serviced on the worker thread, will happen after that. The setTimeout(300) can be replaced by a postMessage ping/pong and a setTimeout delaying the ping or the pong so that the pong happens strictly after the ResumeRequest runnable is dispatched. I'd probably put the setTimeout with the ping and the comment in here. It's still an implementation dependent check, but I think a case can be made for it being a likely implementation error. And most importantly, it covers our crasher! The comment might go something like this: Ensure that throwing an error from the fetch handler does not cancel the still-pending respondWith(). We delay resolving the promise by both a setTimeout and a postMessage ping-pong so that any cancellation logic has the opportunity to be propagated to the page and back to the worker.
Attachment #8847279 - Flags: review?(bugmail) → review+
I defer to :asuth to review this bug. Thanks!
Pushed by catalin.badea392@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9ce518ccb466 Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith(). r=asuth
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Please request Aurora/Beta approval on this when you get a chance.
Flags: needinfo?(catalin.badea392)
Attachment #8847279 - Attachment is obsolete: true
Comment on attachment 8849103 [details] [diff] [review] Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith(). Approval Request Comment [Feature/Bug causing the regression]: Service Worker null deref [User impact if declined]: Crash in websites using SW [Is this code covered by automated tests?]: Yes. WPT test [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: No [Why is the change risky/not risky?]: SW specific [String changes made/needed]: none Note when merging, the changes to the manifest file will probably not apply, just discard them and run ./mach wpt-manifest-update
Flags: needinfo?(catalin.badea392)
Attachment #8849103 - Flags: approval-mozilla-beta?
Attachment #8849103 - Flags: approval-mozilla-aurora?
Comment on attachment 8849103 [details] [diff] [review] Fix crash in respondWith resolved callback. Don't reset interception if the sw throws after .respondWith(). Fix a crash related to SW. Aurora54+ & Beta53+.
Attachment #8849103 - Flags: approval-mozilla-beta?
Attachment #8849103 - Flags: approval-mozilla-beta+
Attachment #8849103 - Flags: approval-mozilla-aurora?
Attachment #8849103 - Flags: approval-mozilla-aurora+
(In reply to Cătălin Badea (:catalinb) from comment #12) > Comment on attachment 8849103 [details] [diff] [review] > Fix crash in respondWith resolved callback. Don't reset interception if the > sw throws after .respondWith(). > > > Approval Request Comment > [Feature/Bug causing the regression]: Service Worker null deref > [User impact if declined]: Crash in websites using SW > [Is this code covered by automated tests?]: Yes. WPT test > [Has the fix been verified in Nightly?]: Yes. > [Needs manual test from QE? If yes, steps to reproduce]: No. > [List of other uplifts needed for the feature/fix]: none > [Is the change risky?]: No > [Why is the change risky/not risky?]: SW specific > [String changes made/needed]: none > > Note when merging, the changes to the manifest file will probably not apply, > just discard them and run ./mach wpt-manifest-update yes this happened when trying to uplift this patch. Can you provide a rebased/aurora-beta patch ? Thanks!
Flags: needinfo?(catalin.badea392)
Flags: needinfo?(catalin.badea392)
Setting qe-verify- based on Catalin's assessment on manual testing needs (see Comment 12) and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: