Closed Bug 1416879 Opened 6 years ago Closed 6 years ago

Crash in mozilla::net::HttpChannelParentListener::OnDataAvailable

Categories

(Core :: DOM: Service Workers, defect, P1)

58 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 blocking fixed
firefox59 + fixed

People

(Reporter: philipp, Assigned: asuth)

References

Details

(6 keywords, Whiteboard: [necko-triaged][adv-main58+])

Crash Data

Attachments

(10 files, 3 obsolete files)

2.50 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
12.38 KB, patch
bkelly
: review+
mayhemer
: review+
Details | Diff | Splinter Review
4.62 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
3.48 KB, patch
baku
: review+
Details | Diff | Splinter Review
7.78 KB, patch
baku
: review+
bkelly
: feedback+
Details | Diff | Splinter Review
20.07 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
2.70 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
1.90 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
2.41 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
9.86 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is
report bp-f200e663-574e-45bf-8c4e-e578d0171106.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::HttpChannelParentListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int) 	netwerk/protocol/http/HttpChannelParentListener.cpp:109
1 	xul.dll 	mozilla::net::HttpChannelParent::DivertOnDataAvailable(nsTString<char> const&, unsigned __int64 const&, unsigned int const&) 	netwerk/protocol/http/HttpChannelParent.cpp:1230
2 	xul.dll 	mozilla::net::ChannelEventQueue::RunOrEnqueue(mozilla::net::ChannelEvent*, bool) 	netwerk/ipc/ChannelEventQueue.h:215
3 	xul.dll 	mozilla::net::HttpChannelParent::RecvDivertOnDataAvailable(nsTString<char> const&, unsigned __int64 const&, unsigned int const&) 	netwerk/protocol/http/HttpChannelParent.cpp:1192
4 	xul.dll 	mozilla::net::PHttpChannelParent::OnMessageReceived(IPC::Message const&) 	ipc/ipdl/PHttpChannelParent.cpp:766
5 	xul.dll 	mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) 	ipc/ipdl/PContentParent.cpp:3201
6 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:2119
7 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:2049
8 	xul.dll 	mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) 	ipc/glue/MessageChannel.cpp:1895
9 	xul.dll 	mozilla::ipc::MessageChannel::MessageTask::Run() 	ipc/glue/MessageChannel.cpp:1928
10 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1037
11 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:97
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
14 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:158
15 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:230
16 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:288
17 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4675
18 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4837
19 	xul.dll 	mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/Bootstrap.cpp:49
20 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:304
21 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:111
22 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:283
23 	kernel32.dll 	BaseThreadInitThunk 	
24 	ntdll.dll 	RtlUserThreadStart
=============================================================

crash reports with this signature are getting more common during the 58 cycle - they are happening in   MOZ_RELEASE_ASSERT(!mSuspendedForDiversion, "Cannot call OnDataAvailable if suspended for diversion!") added in bug 975338.
SC, looks like something related to your Pbg-OMT job?
Flags: needinfo?(schien)
The crash history can be traced back to Firefox 38. It shows similar crash rates before and after Firefox 56, which is the version PBg-Http started to land, so might not be related to PBg-OMT.

One comment mentioned using data throttling in RDW to interfere the OnDataAvailable timing will trigger this release assertion. My theory is that intercepted stream listener might break the diversion event sequence because suspension for diversion is only notified to HttpChannelParentListener but not all succeeding listeners. The intercepted stream listener might still populate OnDataAvailable during the suspension, thus break the contract of channel diversion procedure.
Flags: needinfo?(schien)
Priority: -- → P2
Whiteboard: [necko-triaged]
Hmm...checking the crash report again and found 58.0b4 have much higher crash rate than other version. Raise to P1 since it is a frequent chrome process crash.
Priority: P2 → P1
Here are my findings:

- The crash volume ramp up after Firefox 58 hit beta channel.
- The largest change around channel diversion landed on release 58 before that build is Bug 1391693.
- Several user comments in recent crash report indicate that Firefox crashed while downloading email attachment from Office 365
- Over 99% of the crashes happen on Windows.

The next step is to find stable STR and get MOZ_LOG for analysis, mark qawanted.
Keywords: qawanted
See Also: → 1391693
Andrew, any chance this is related to the channel diversion bug you took for me?
Flags: needinfo?(bugmail)
I tried to find some STR for this crash but I was not successful in doing so.

Environment:
- Windows 7 64/32bit, Windows 10 64bit
- Firefox 58.0b4 and Firefox 58.0b5
- New and dirty profiles

What I tested:
- Downloading attachments and opening them from Outlook, YahooMail, Gmail.
- Downloading pdf files, image, videos and other files from different websites, cloud services etc while watching videos from facebook, youtube, twitch and others.

Unfortunately we don't have a license of Office 365 (I saw in crash comments that it came up) so I could not cover that scenario. Please let me know if I can help with something further on.
judging by the comments and curve of the spike on nightly, this might be the same as bug 1418795.
Crash Signature: [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] → [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] [@ mozilla::net::HttpChannelParent::DivertOnDataAvailable]
See Also: → 1418795
A very high percentage of the mac crash are https://outlook.live.com, operating with their inbox or attachments.
On Mac there is also the following note on the sign in page: "You're seeing our new sign-in experience
Go back to the old one." So the crash on Mac could be related to that change.
(In reply to [:philipp] from comment #7)
> judging by the comments and curve of the spike on nightly, this might be the
> same as bug 1418795.

Marking as a security bug for now because there's a clear UAF and content can cause it to happen.

It looks like they're correlated and probably inter-related, but may differ a bit.  In the browser chrome test case I created for bug 1418795, I've managed to reproduce this case.

The assertion is because:
- HttpChannelParent::ResponseSynthesized() does `mWillSynthesizeResponse = false;` then `MaybeFlushPendingDiversion();`.
- MaybeFlushPendingDiversion() invokes DivertTo().
- DivertTo() only does a non-dangerous thing for synthesized responses if mWillSynthesizeResponse == true, which it no longer is.  
- The wrong thing is to send a pair of "FlushedForDiversion" and "DivertComplete" messages to the child.
- They are currently wrong because they will result in mSynthesizedResponsePump->Resume() (which can trigger DivertOnDataAvailable messages to be sent independent of mEventQ) being invoked followed by mEventQ->Resume() (which will trigger a DivertComplete) message.
- An assertion fires (debug builds) or UAF (opt builds) if we get a DivertOnDataAvailable after a DivertComplete message.
- Because the synthesized pump gets to dispatch things before mEventQ, it can get in a single DivertOnDataAvailable message before DivertComplete is sent.  However, if there's anything more to pump, we'll get a DivertOnDataAvailable after the DivertComplete and kaboom.
- The existing SW browser_download.js test has a body of 'service worker generated download' which fits in a single message because it's not actually a stream and is entirely synchronously available.

Unfortunately, fixing things so interception still works is not as simple as not clearing mWillSynthesizeResponse.  We do need to eventually progress further into DivertTo() and StartDiversion() so that OnStartRequest/etc. will be generated.  It looks like our non-streaming assumptions weren't entirely eliminated, because currently it's really not safe to invoke MaybeFlushPendingDiversion() unless we're sure the child channel can pump the entirety of the intercepted data in a single pump.

The child pump state machine needs to be corrected so that DivertComplete is deferred until after DivertOnStopRequest is sent by the synthesis pump.  Or something roughly equivalent in terms of the HttpChannelParentListener state machine.

Clearing the NI for now so I can be re-NI'ed, but I plan to still work on this with a high priority.
Assignee: nobody → bugmail
Group: core-security
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
There are 125 crashes in nightly 59 and 6522 crashes in beta 58 (the signature 'mozilla::net::HttpChannelParentListener::OnDataAvailable' is ranked #2 in beta top-crashers for browser process).
About a thousand crashes a day on beta, marking as blocker.
Group: core-security → network-core-security
Flags: needinfo?(bugmail)
I am actively working this and should have a fix soon.
(In reply to Andrew Sutherland [:asuth] from comment #13)
> I am actively working this and should have a fix soon.

Any updates?
Tracking for 59 as well. What should we rate this? sec-high at least ?
Component: Networking → DOM: Service Workers
Crash Signature: [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] [@ mozilla::net::HttpChannelParent::DivertOnDataAvailable] → [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] [@ mozilla::net::HttpChannelParent::DivertOnDataAvailable] [@ mozilla::net::HttpChannelParentListener::OnStopRequest]
(In reply to Andrew Sutherland [:asuth] from comment #10)
> (In reply to [:philipp] from comment #7)
> > judging by the comments and curve of the spike on nightly, this might be the
> > same as bug 1418795.
> 
> Marking as a security bug for now because there's a clear UAF and content
> can cause it to happen.

Andrew: Where's the clear UAF?  You didn't link to a crash report; perhaps you reproed this locally and showed it could be.

the 28000ish beta/nightly crashes in the last month are all nullptrs in crashstats.

csectype-uaf and sec-high based on your comment (for now)
(In reply to Randell Jesup [:jesup] from comment #16)
> (In reply to Andrew Sutherland [:asuth] from comment #10)
> > (In reply to [:philipp] from comment #7)
> > > judging by the comments and curve of the spike on nightly, this might be the
> > > same as bug 1418795.
> > 
> > Marking as a security bug for now because there's a clear UAF and content
> > can cause it to happen.
> 
> Andrew: Where's the clear UAF?  You didn't link to a crash report; perhaps
> you reproed this locally and showed it could be.

The specific direct UAF is a null-deref as you surmise from the crash reports, we reference the nulled out mParentListener at:
https://searchfox.org/mozilla-central/rev/22c55eb7b7e6494a8615a7af3b613ff899d2cdba/netwerk/protocol/http/HttpChannelParent.cpp#1230

I do get that we don't treat null-derefs as a huge deal because they reliably crash, although frustratingly I feel like I read a post by someone (you?) somewhere recently that covered the extended rationale behind that and our severity ratings in context, but I cannot for the life of me find it across the web or my feed reader or my email client.  All I can find is https://wiki.mozilla.org/Security_Severity_Ratings which literally does not include the word "null".

I was a bit concerned that the violation of the state machine could be leveraged further than the null deref because there is so much happening in the intercept diversion case and I didn't look into the certificate saving diversion case.  Also, I had done most of my testing against a debug build that trips much earlier over asserts, rather than an ASAN or other build that would notice the other dangerous stuff we get up to before we reach the crash, so again, I set the flag out of caution.

This probably wants to be a sec-low for a little longer until I finish up, then hopefully we can drop the flag if we are only dealing with null derefs.
Attached patch Test that needs part of the part 2 (obsolete) — — Splinter Review
Flags: needinfo?(bugmail)
UAF -> nullptr; sec-low.  
UAF means actually reading/writing/executing memory after that memory is freed, which this doesn't do, thankfully.
Andrew told me the problem he is running into is that his new test leaks.  I took a look and found:

1. The test leaks because the channels are never canceled.  So they are still processing the download data when the leak checks run.
2. The channels are not canceled because nsExternalAppHandler::Cancel() is called when the download cancel button is pressed, but the nsIChannel::Cancel() methods are not called.

I did some quick attempts at propagating the cancel call, but I couldn't get it to work in the short time I had.

I recommend we make this test download a small limited amount of data that completes in a bounded time frame.  We can verify that the stream is complete before letting the test finish.  We can also maybe verify that it was not written to disk.

We should then file a follow-up bug to actually cancel nsIChannel requests when the download cancel button is pressed.
Also note, this failure and fix make perfect sense to me now as well.

The service worker interception code originally buffered all data into memory and then performed a single OnDataAvailable() when it synthesized on the channel.  Apparently the synthetic diversion code depended on this behavior since it completed the diversion nearly immediately without waiting for the synthesis to stop.

In bug 1204254 I changed service worker interception to stream response bodies.  This broke the synthetic diversion code since the immediate completion cut off the synthesis mid-stream.  Our tests did not catch it because they previously only used relatively small bodies which still fit in a single OnDataAvailable() even with the streaming body code.

Andrew's fix is to make diversion wait to complete until the synthesized body is complete.
I would also note that clearly there is something that does stop the channels somehow.  If I start downloading an ubuntu.iso and then click the "x" in the download panel to stop it, the network data also stops.  I just don't understand why nsIChannel cancel is not called there or is not called with the download usage in this test.
sec-low, but number of crashes is high, which is why this is blocking 58.  We need a solution ASAP if it's getting into 58.  Reviews?
Flags: needinfo?(bugmail)
Flags: needinfo?(bkelly)
I believe we have a fix and the test just needs to be changed to stop properly.  This seems doable this week.  For easy reference, next merge day is Jan 22.
Flags: needinfo?(bkelly)
Jan 22 is the final merge date. The first merge of 59 to Beta is on the 11th.
I have working tests with full cancellation propagation, cleaning up the patch stack now.  (I did pursue the simpler comment 21 approach of deferring handling cancellation, but ran into the issue that the related bug 1418795 involves a variant on cancellation.)
Diversion for intercepted channels with a synthesized response is a
special case.  It is not appropriate to send DivertComplete when
mEventQ has been drained, because we are not dealing with the usual
mEventQ-enqueued OnDataAvailable payloads that had been received over
the network and sent down to the child.  In this case, all the data
originates in the child and does not go through mEventQ.  As such,
the correct place to send DivertComplete is at OnStopComplete for the
synthesized response.
Attachment #8940160 - Flags: review?(bkelly)
Flags: needinfo?(bugmail)
The diversion mechanism never expected to be dealing with data sourced
from the content process, but that's exactly what happens with
ServiceWorker-intercepted channels with the current child-intercept
situation (which is being fixed).

In order to allow timely cancellation of diverted intercepted
channels, there needs to be a way to relay to the HttpChannelChild
that it needs to be canceled so that the synthesized pump can be
canceled and diversion can be marked as complete.  This patch adds
such a mechanism to ADivertableParentChannel and PHttpChannel for the
exclusive use of InterceptedHttpChannel and then uses it.
Attachment #8940162 - Flags: review?(dd.mozilla)
Attachment #8940162 - Flags: review?(bkelly)
The SyntheticDiversionListener needs to handle the case where the IPC
connection is gone.  This patch avoids calling Send* methods which will
crash the content process if the actor has already been destroyed.
Additionally, OnDataAvailable will return an error in such a case so
that the caller can properly handle the error rather than continuing to
attempt to send data to a listener that doesn't care.  This latter
change is an artifact of a previous hack attempt to fix a related
diversion issue that is probably not required for this stack, but makes
sense as a fix, so I've left it in.
Attachment #8940163 - Flags: review?(bkelly)
In the scenario where a ServiceWorker returns a pass-through fetch via
`evt.respondWith(fetch("underlying"))`, in order for the "underlying"
HTTP channel to be canceled when the outer HTTP channel is canceled,
FetchDriver's OnDataAvailable method needs to return an error when
the output pipe experiences an error.

Unfortunately, the contract for ReadSegments is effectively that it
returns NS_OK regardless of what the rv of the write handler returned,
so relying on the returned rv is insufficient.  And because various
Write*() methods will all fast-path to returning NS_OK if a count of 0
is passed, it's necessary to infer a closed/broken pipe by noticing
that we tried to write more than 0 bytes of data but 0 bytes were
written.  (This is safe because the pipe we write into was created
by FetchDriver::OnStartRequest which explicitly creates an infinite
pipe, so it's not possible for the write to fail due to lack of space
in the pipe.)
Attachment #8940165 - Flags: review?(bkelly)
Attachment #8940165 - Flags: review?(bkelly) → review?(amarchesini)
Currently, FetchStreamReader never signals to the JS stream code that
the reader has been closed.  This means that when a ServiceWorker
passes a ReadableStream to respondWith and the HTTP Channel gets
canceled, the JS code will keep generating the stream without ever
realizing the data's not going anywhere.  It's necessary to cancel
the reader.  Or do something like that, this seems to work!
Attachment #8940169 - Flags: review?(amarchesini)
This adds a test where we have a ServiceWorker return 2 different types
of streams that Firefox recognizes as downloads which are handled by
diversion of the channel to the parent.  The diverted downloads are
then cancelled and we verify that cancellation actually results in the
underlying connections being closed and/or the ServiceWorker notified.

Our 2 types of streams are:
1. A pass-through stream that is incrementally delivered through use of
   an .sjs file that delivers data using setInterval.
2. A SW-authored ReadableStream (which is not enabled by default, so we
   set a pref.)

Determining when the .sjs's stream is canceled is accomplished by
opening a second "monitor" connection that only completes when the
streaming connection is closed.

In all cases we differentiate between cancelation and timeouts firing.
Attachment #8940170 - Flags: review?(bkelly)
Attachment #8938665 - Attachment is obsolete: true
Attachment #8938666 - Attachment is obsolete: true
Attachment #8940165 - Flags: review?(amarchesini) → review+
Attachment #8940160 - Flags: review?(bkelly) → review+
Comment on attachment 8940162 [details] [diff] [review]
Part 2: Allow for diversion cancellation and trigger for intercepted channels

Review of attachment 8940162 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/InterceptedHttpChannel.cpp
@@ +505,5 @@
> +  // intercepted channel could be a long-running stream, we need to request that
> +  // cancellation be triggered in the child, completing the diversion and
> +  // allowing cancellation to run to completion.
> +  if (mDiverting) {
> +    Unused << mParentChannel->CancelDiversion();

Do we need to do this in nsHttpChannel as well?

After reading the comment in ADivertableParentChannel.h I guess the answer is "no".
Attachment #8940162 - Flags: review?(bkelly) → review+
Comment on attachment 8940163 [details] [diff] [review]
Part 3: (Also Bug 1418795) SyntheticDiversionListener should handle !mIPCOpen

Review of attachment 8940163 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ -614,5 @@
>  
>    DoOnStartRequest(this, mListenerContext);
>  }
>  
> -namespace {

Why remove the anonymous namespace?
Attachment #8940163 - Flags: review?(bkelly) → review+
Comment on attachment 8940169 [details] [diff] [review]
Part 5: FetchStreamReader needs to cancel its reader when it encounters write errors

Review of attachment 8940169 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/fetch/FetchStreamReader.cpp
@@ +33,5 @@
>    bool Notify(Status aStatus) override
>    {
>      if (!mWasNotified) {
>        mWasNotified = true;
> +      mReader->CloseAndRelease(nullptr, NS_ERROR_DOM_INVALID_STATE_ERR);

There is a jscontext in the WorkerHolder's WorkerPrivate.  Is there a reason not to use it here?

@@ +148,5 @@
> +  if (aCx) {
> +    MOZ_ASSERT(mReader);
> +
> +    // Let's use a generic error.
> +    RefPtr<DOMException> error = DOMException::Create(NS_ERROR_DOM_TYPE_ERR);

NS_ERROR_DOM_ABORT_ERR might be a bit better?  The default type error message does not fit very well.  Or why not use the value passed in?

@@ +153,5 @@
> +
> +    JS::Rooted<JS::Value> errorValue(aCx);
> +    if (ToJSValue(aCx, error, &errorValue)) {
> +      JS::Rooted<JSObject*> reader(aCx, mReader);
> +      JS::ReadableStreamReaderCancel(aCx, reader, errorValue);

Do we want to call cancel or just gracefully close for NS_BASE_STREAM_CLOSED?

::: dom/fetch/FetchStreamReader.h
@@ +40,5 @@
>    void
>    RejectedCallback(JSContext* aCx, JS::Handle<JS::Value> aValue) override;
>  
>    void
> +  CloseAndRelease(JSContext* aCx, nsresult aStatus);

Maybe add a comment that the context is optional here?
Attachment #8940169 - Flags: feedback+
Comment on attachment 8940170 [details] [diff] [review]
Part 6: Test cancellation of diverted client-intercepted streams

Review of attachment 8940170 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/test/serviceworkers/download_canceled/sw_download_canceled.js
@@ +77,5 @@
> +              what: filename, why: "canceled", message: e.message, ticks: count
> +            });
> +          }
> +        }
> +        intervalId = setInterval(tick, TICK_INTERVAL);

Why use setInterval() instead of the ReadableStream pull mechanism?  That would be an easier way to just saturate the stream.
Attachment #8940170 - Flags: review?(bkelly) → review+
Comment on attachment 8940162 [details] [diff] [review]
Part 2: Allow for diversion cancellation and trigger for intercepted channels

:ddamjano is not on IRC right now, so adding :mayhemer as an alternate reviewer.
Attachment #8940162 - Flags: review?(honzab.moz)
(In reply to Ben Kelly [:bkelly] from comment #37)
> Do we need to do this in nsHttpChannel as well?
> 
> After reading the comment in ADivertableParentChannel.h I guess the answer
> is "no".

Yeah, the rationale there of "normal diversion is only going to be some small number of network packets rather than an unbounded stream and so cancellation isn't necessary" is my overall rationale.  Also, minimizing control-flow changes for something we want to uplift.  Also, we can get rid of that after we switch to parent intercept, probably.
Comment on attachment 8940163 [details] [diff] [review]
Part 3: (Also Bug 1418795) SyntheticDiversionListener should handle !mIPCOpen

Review of attachment 8940163 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ -614,5 @@
>  
>    DoOnStartRequest(this, mListenerContext);
>  }
>  
> -namespace {

mIPCOpen and the related RemoteChannelExists() helper are both marked private, so I had to make SyntheticDiversionListener a friend of HttpChannelChild so it could access the private member.

Making it an inner class could also have been an option I suppose, but this minimized patch size.
Comment on attachment 8940169 [details] [diff] [review]
Part 5: FetchStreamReader needs to cancel its reader when it encounters write errors

Review of attachment 8940169 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/fetch/FetchStreamReader.cpp
@@ +33,5 @@
>    bool Notify(Status aStatus) override
>    {
>      if (!mWasNotified) {
>        mWasNotified = true;
> +      mReader->CloseAndRelease(nullptr, NS_ERROR_DOM_INVALID_STATE_ERR);

I wasn't quite sure what to do, so I erred on the side of not doing additional work on worker termination and leaving it to future web platform tests.  I included some hand-waving in the comment on CloseAndRelease about not doing further work.

@@ +148,5 @@
> +  if (aCx) {
> +    MOZ_ASSERT(mReader);
> +
> +    // Let's use a generic error.
> +    RefPtr<DOMException> error = DOMException::Create(NS_ERROR_DOM_TYPE_ERR);

I copied this logic from FetchStream::ErrorPropagation.  The rationale for using a hard-coded error was that the ::Write calls propagate the explicit rv from the bowels of the stream/necko subsystems, and some sort of mapping is necessary in that case beyond NSResultToNameAndMessage.  Alternately, the Write cases could be made to use a specific DOM error and this method could then use what's passed in since this file would control all the errors.  Again, I was figuring that might be addressed by web-platform-tests correctness work, but am happy to change here.

@@ +153,5 @@
> +
> +    JS::Rooted<JS::Value> errorValue(aCx);
> +    if (ToJSValue(aCx, error, &errorValue)) {
> +      JS::Rooted<JSObject*> reader(aCx, mReader);
> +      JS::ReadableStreamReaderCancel(aCx, reader, errorValue);

I wasn't sure quite what to do here either, but the cancel call-chain ends up at https://searchfox.org/mozilla-central/rev/b24e6342d744c5a83fab5c15972e11eeb69d68e6/js/src/builtin/Stream.cpp#1365 in such a case, so it seemed fine:

    // Step 2: If stream.[[state]] is "closed", return a new promise resolved
    //         with undefined.
    if (stream->closed())
        return PromiseObject::unforgeableResolve(cx, UndefinedHandleValue);
Comment on attachment 8940170 [details] [diff] [review]
Part 6: Test cancellation of diverted client-intercepted streams

Review of attachment 8940170 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/test/serviceworkers/download_canceled/sw_download_canceled.js
@@ +77,5 @@
> +              what: filename, why: "canceled", message: e.message, ticks: count
> +            });
> +          }
> +        }
> +        intervalId = setInterval(tick, TICK_INTERVAL);

That's a good idea.  The goal was to have data flowing without risking making the test slow because things were getting overwhelmed and setInterval seemed sufficient so I didn't worry about such flow control mechanisms.  (Also, given the number of infinite pipes we use in the code such as in FetchDriver and our IPC stream without back-pressure mechanisms, I'd be a little reluctant to do that without better understanding how we bridge to the streams' model.)

Given that on local debug builds I was seeing the test report it took ~30 ticks for the cancellation to go through, it seemed like the value was good enough.  (The test reports tick count in an info() line.)
Comment on attachment 8940165 [details] [diff] [review]
Part 4: FetchDriver needs to propagate write failures

Review of attachment 8940165 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/fetch/FetchDriver.cpp
@@ +1114,5 @@
> +  // errors.  Unfortunately, nsIOutputStream has an ill-conceived contract when
> +  // taken together with ReadSegments' contract, because the pipe will just
> +  // NS_OK if we try and invoke its Write* functions ourselves with a 0 count.
> +  // So we must just assume the pipe is broken.
> +  if (aRead == 0 && aCount != 0 && aRead == 0) {

double-checking aRead is REALLY a zero ? :)
Comment on attachment 8940162 [details] [diff] [review]
Part 2: Allow for diversion cancellation and trigger for intercepted channels

Review of attachment 8940162 [details] [diff] [review]:
-----------------------------------------------------------------

looks ok when considering only this single patch.  I somewhat went through the other patches as well, but it's still hard to put this to the whole picture.  hence, r+ with no warranty!
Attachment #8940162 - Flags: review?(honzab.moz) → review+
Comment on attachment 8940169 [details] [diff] [review]
Part 5: FetchStreamReader needs to cancel its reader when it encounters write errors

Review of attachment 8940169 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/fetch/FetchStreamReader.h
@@ +40,5 @@
>    void
>    RejectedCallback(JSContext* aCx, JS::Handle<JS::Value> aValue) override;
>  
>    void
> +  CloseAndRelease(JSContext* aCx, nsresult aStatus);

Yes, add a comment explaining that, with aCx, the stream is canceled.
Attachment #8940169 - Flags: review?(amarchesini) → review+
Attachment #8940162 - Flags: review?(dd.mozilla)
Unhiding this because it is apparently just a null deref.
Group: network-core-security
Attached patch verify-fix.patch (obsolete) — — Splinter Review
This patch fixes failure in --verify.  It also needs bug 1428447 to pass the verify check.  I don't think we need bug 1428447 for uplift, though.
Comment on attachment 8940326 [details] [diff] [review]
verify-fix.patch

Thanks!  I'm going to fold this into Part 6 when I land.  If you're not around within some epsilon of when I realize the tree re-opens and it's not already landed, I'll plan to land your Bug 1428447 first.
Attachment #8940326 - Flags: review+
Sounds good.
https://hg.mozilla.org/integration/mozilla-inbound/rev/714d3942fb10a96e60e14c475e24e640b9ddd8cc
Bug 1416879 - Part 1: DivertComplete should only be sent at OnStopRequest for synthesized responses. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/d156f6b687e1cd8717bf813ab1b944e87b67ed5d
Bug 1416879 - Part 2: Allow for diversion cancellation and trigger for intercepted channels. r=bkelly, r=mayhemer

https://hg.mozilla.org/integration/mozilla-inbound/rev/5453b8a58f0c2c28dc7c407c531c266972bff423
Bug 1416879 - Part 3: (Also Bug 1418795) SyntheticDiversionListener should handle !mIPCOpen. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/8e4fd74e7f5e69df7363bdb560f79dde347ce082
Bug 1416879 - Part 4: FetchDriver needs to propagate write failures. r=baku

https://hg.mozilla.org/integration/mozilla-inbound/rev/994dc643a2ab62f03fef780a58971b476a4b6f4a
Bug 1416879 - Part 5: FetchStreamReader needs to cancel its reader when it encounters write errors. r=baku, f=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/840a6e04bcea7d87e362adf14a37b7c17e20f043
Bug 1416879 - Part 6: Test cancellation of diverted client-intercepted streams. r=bkelly
Andrew, given the short time frame can you request uplift now?  It would be nice to uplift before first merge on Jan 11.
Flags: needinfo?(bugmail)
(In reply to Ben Kelly [:bkelly] from comment #56)
> Andrew, given the short time frame can you request uplift now?  It would be
> nice to uplift before first merge on Jan 11.

Planning to very soon, but there are some perma-oranges on the try push to beta including Bug 1424914 that I need to evaluate.
(Try push was comment 53, https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f8f4b4f4b856fa2383c1f8c1c4898d2a18dec4a.)
Flags: needinfo?(bugmail)
Depends on: 1353556
Backout by cbrindusan@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/ed0571d18d1f
Backed out 6 changesets for bc permafailures on windows on browser_multie10s_update.js r=backout a=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla59 → ---
Use the even more thorough logic from bug 1428652 to wait for the SW to
become redundant.
Attachment #8941276 - Flags: review?(bkelly)
Attachment #8940170 - Attachment is obsolete: true
This test used a fixed setTimeout of 3secs for serving the SW.  This
lower-bounded the test runtime at 6 seconds, plus it was not safe in
the event of a slow test runner.

This set of changes, although a little ugly, improves the logic so that
the SW's transmission is driven by a separate "release" fetch that is
only triggered when both updates have been issued and the first update
failure has been reported.  This ensures we are observing the desired
situation.  There's also a sanity check on the number of times the SW
script is fetched.
Attachment #8941278 - Flags: review?(bkelly)
Attachment #8941275 - Flags: review?(bkelly) → review+
Attachment #8941277 - Flags: review?(bkelly) → review+
Comment on attachment 8941278 [details] [diff] [review]
Part 0c: browser_multie10s_update.js should not use setTimeout

Review of attachment 8941278 [details] [diff] [review]:
-----------------------------------------------------------------

I don't completely understand the sjs, but if you got it from another test and its green on try, then ok.
Attachment #8941278 - Flags: review?(bkelly) → review+
Comment on attachment 8941276 [details] [diff] [review]
Part 0d: Make browser_force_refresh.js wait for the SW to be made redundant

Review of attachment 8941276 [details] [diff] [review]:
-----------------------------------------------------------------

Note, its only safe to do this if you are sure all controlled clients are closed.  Is the tab removed after this uncontrolled?
Attachment #8941276 - Flags: review?(bkelly) → review+
See Also: → 1429794
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3b4b0f2a73c7a6fdcf1389e08ed5f59487a0f3e
Bug 1416879 - Part 1: DivertComplete should only be sent at OnStopRequest for synthesized responses. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/7052da34a62d03ad510ad273f8cad970157fd5ba
Bug 1416879 - Part 2: Allow for diversion cancellation and trigger for intercepted channels. r=bkelly, r=mayhemer

https://hg.mozilla.org/integration/mozilla-inbound/rev/f4ebb9ec34523bfd24d86b3769e5f4185db25d25
Bug 1416879 - Part 3: (Also Bug 1418795) SyntheticDiversionListener should handle !mIPCOpen. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/58c617de48f4b47ffc4cd0f3f03537ff9baa0576
Bug 1416879 - Part 4: FetchDriver needs to propagate write failures. r=baku

https://hg.mozilla.org/integration/mozilla-inbound/rev/0cd43f3de5fb904e27d0dfefdfc10d1c58fea9da
Bug 1416879 - Part 5: FetchStreamReader needs to cancel its reader when it encounters write errors. r=baku, f=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/1cc9cf654ea3bad4ce3a0e4e41d616da0c997625
Bug 1416879 - Part 6: Test cancellation of diverted client-intercepted streams. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/db45cca5049e0869d0561c009b6e0496e339f8d0
Bug 1416879 - Part 0a: Make browser_force_refresh.js clean up after itself. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/c3bc48075e7447581a4ffd4a4bd8ed60b3d9945f
Bug 1416879 - Part 0b: browser_multie10s_update.js needs to protect its invariants, clean-up after itself. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d66c4b6ab0d7dbea4ae2ebd7b843e0e7f95d057
Bug 1416879 - Part 0c: browser_multie10s_update.js should not use setTimeout. r=bkelly

https://hg.mozilla.org/integration/mozilla-inbound/rev/6e926521ce37cadc4fcbd3451542a7975446311e
Bug 1416879 - Part 0d: Move browser_multie10s_update.js into its own directory. r=bkelly
Crash Signature: [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] [@ mozilla::net::HttpChannelParent::DivertOnDataAvailable] [@ mozilla::net::HttpChannelParentListener::OnStopRequest] → [@ mozilla::net::HttpChannelParentListener::OnDataAvailable] [@ mozilla::net::HttpChannelParent::DivertOnDataAvailable] [@ mozilla::net::HttpChannelParentListener::OnStopRequest] [@ mozilla::ipc::LogicError | mozilla::net::PHttpChannelChild::SendDivert…
Andrew, can you request beta uplift?  Do we need a separate request to uplift to release since the initial merge has happened?
Attachment #8940326 - Attachment is obsolete: true
Comment on attachment 8940170 [details] [diff] [review]
Part 6: Test cancellation of diverted client-intercepted streams

(obsoleted this previously when I had meant to obsolete bkelly's verify patch that was folded into this one)
Attachment #8940170 - Attachment is obsolete: false
(In reply to Ben Kelly [:bkelly] from comment #76)
> Andrew, can you request beta uplift?  Do we need a separate request to
> uplift to release since the initial merge has happened?

Yes, although your comment made me double-check what I thought was my 58 try push in comment 73.  I used the wrong base and ended up grafting on top of post-mozilla-central merge beta, rather than before the merge.  The comment 78 try push uses the current "release" tag.  The entire patch stack grafts almost fully cleanly except for trivial browser.ini merges required in part 6 and part 0d... hg shelled out to kdiff3 for me which automatically fixed them...

Anyways, requesting now.
Flags: needinfo?(bugmail)
Comment on attachment 8940160 [details] [diff] [review]
Part 1: DivertComplete should only be sent at OnStopRequest for synthesized responses

This is a request for 58.  I'm not sure what flag to use given our position in the merge.

Approval Request Comment
[Feature/Bug causing the regression]:
Some combination of:
- Bug 1204254 which made ServiceWorkers stream their results, revealing a longstanding bug in diversion, which is bug 975338.
- Bug 1220681 which introduced the un-checked use of IPC for diversion support.
- outlook.com starting to use ServiceWorkers.

[User impact if declined]:
Crashes.  When we don't crash, downloads involving ServiceWorkers will continue to do CPU and network work even after being canceled, although there are no privacy issues from this because the data never actually goes anywhere useful.

[Is this code covered by automated tests?]:
Yes, part 6 includes a thorough automated test.

[Has the fix been verified in Nightly?]:
It hasn't been landed quite long enough for that.

[Needs manual test from QE? If yes, steps to reproduce]:
No.  In practice, I've found it pretty hard (impossible) to trigger the crashing behavior on outlook.com with their beta UI that uses the service worker, at least for me.  At least currently, attachments fated for download go through another origin that lacks a registered ServiceWorker.  It's possible the office 365 path operates differently or this is a recent mitigation for our crashing by the outlook team or they did it for other reasons.

[List of other uplifts needed for the feature/fix]:
All the stuff on this bug, as landed.

[Is the change risky?]:
No, the test is very thorough.

[Why is the change risky/not risky?]:
Have I mentioned the test?

[String changes made/needed]:
No string changes.
Attachment #8940160 - Flags: approval-mozilla-beta?
Attachment #8940165 - Flags: approval-mozilla-beta?
Attachment #8940162 - Flags: approval-mozilla-beta?
Attachment #8940169 - Flags: approval-mozilla-beta?
Attachment #8940170 - Flags: approval-mozilla-beta?
Attachment #8940163 - Flags: approval-mozilla-beta?
Attachment #8941277 - Flags: approval-mozilla-beta?
Attachment #8941278 - Flags: approval-mozilla-beta?
Attachment #8941275 - Flags: approval-mozilla-beta?
Attachment #8941276 - Flags: approval-mozilla-beta?
Comment on attachment 8940160 [details] [diff] [review]
Part 1: DivertComplete should only be sent at OnStopRequest for synthesized responses

This signature has a high volume of crashes in beta. We have to take this. Beta58+.
Attachment #8940160 - Flags: approval-mozilla-release+
Attachment #8940160 - Flags: approval-mozilla-beta?
Attachment #8940160 - Flags: approval-mozilla-beta+
Attachment #8940162 - Flags: approval-mozilla-release+
Attachment #8940162 - Flags: approval-mozilla-beta?
Attachment #8940162 - Flags: approval-mozilla-beta+
Attachment #8940163 - Flags: approval-mozilla-release+
Attachment #8940163 - Flags: approval-mozilla-beta?
Attachment #8940163 - Flags: approval-mozilla-beta+
Attachment #8940165 - Flags: approval-mozilla-release+
Attachment #8940165 - Flags: approval-mozilla-beta?
Attachment #8940165 - Flags: approval-mozilla-beta+
Attachment #8940169 - Flags: approval-mozilla-release+
Attachment #8940169 - Flags: approval-mozilla-beta?
Attachment #8940169 - Flags: approval-mozilla-beta+
Attachment #8940170 - Flags: approval-mozilla-release+
Attachment #8940170 - Flags: approval-mozilla-beta?
Attachment #8940170 - Flags: approval-mozilla-beta+
Attachment #8941275 - Flags: approval-mozilla-release+
Attachment #8941275 - Flags: approval-mozilla-beta?
Attachment #8941275 - Flags: approval-mozilla-beta+
Attachment #8941276 - Flags: approval-mozilla-release+
Attachment #8941276 - Flags: approval-mozilla-beta?
Attachment #8941276 - Flags: approval-mozilla-beta+
Attachment #8941277 - Flags: approval-mozilla-release+
Attachment #8941277 - Flags: approval-mozilla-beta?
Attachment #8941277 - Flags: approval-mozilla-beta+
Attachment #8941278 - Flags: approval-mozilla-release+
Attachment #8941278 - Flags: approval-mozilla-beta?
Attachment #8941278 - Flags: approval-mozilla-beta+
Whiteboard: [necko-triaged] → [necko-triaged][adv-main58+]
You need to log in before you can comment on or make changes to this bug.