Closed Bug 1374943 Opened 2 years ago Closed 2 years ago

Crash in mozilla::dom::FetchDriver::OnStopRequest

Categories

(Core :: DOM: Service Workers, defect, critical)

51 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- wontfix
firefox54 --- wontfix
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: philipp, Assigned: asuth)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-8d2087e1-b8a3-4dcb-8eb3-f55bd2170219.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::dom::FetchDriver::OnStopRequest(nsIRequest*, nsISupports*, nsresult) 	dom/fetch/FetchDriver.cpp:724
1 	xul.dll 	mozilla::net::nsStreamListenerWrapper::OnStopRequest(nsIRequest*, nsISupports*, nsresult) 	netwerk/base/nsStreamListenerWrapper.h:30
2 	xul.dll 	mozilla::net::nsStreamListenerTee::OnStopRequest(nsIRequest*, nsISupports*, nsresult) 	netwerk/base/nsStreamListenerTee.cpp:51
3 	xul.dll 	mozilla::net::HttpBaseChannel::DoNotifyListener() 	netwerk/protocol/http/HttpBaseChannel.cpp:2752
4 	xul.dll 	mozilla::net::nsHttpChannel::ContinueHandleAsyncRedirect(nsresult) 	netwerk/protocol/http/nsHttpChannel.cpp:635
5 	xul.dll 	mozilla::net::nsHttpChannel::OnRedirectVerifyCallback(nsresult) 	netwerk/protocol/http/nsHttpChannel.cpp:7501
6 	xul.dll 	mozilla::net::nsAsyncVerifyRedirectCallbackEvent::Run() 	netwerk/base/nsAsyncRedirectVerifyHelper.cpp:40
7 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1067
8 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
9 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
10 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
11 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:262
12 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:283
13 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4401
14 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp:4534
15 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4625

this is a low volume crash signature regressing since firefox 51 on all platforms. timing wise i'm guessing that could be related to the landing of bug 1187335.
the user comment to the report at the top might contain a testcase.
Thanks to my mis-spent time on bug 1373310 and the excellent stack from comment 0, it was easy for me to create a mochitest to re-create the crash.  (Well, I hit the slightly earlier MOZ_ASSERT because I roll DEBUG style.)  Further investigation pending.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
It looks like the comment 0 stack is one of 2 main variants we can expect, depending on the responsible PushRedirectAsyncFunc'ed function.  Comment 0 is for ContinueHandleAsyncRedirect for redirects processed via cache in ReadFromCache, and crashes like bp-61abf0e4-d505-4f0e-a6d9-576600170621 seem to correspond to ContinueProcessResponse3 for redirects not-from-cache via ContinueProcessResponse2 although I don't fully understand that chain yet[1]

We experience this problem when doing fetch("url that returns a redirect", { redirect: "error"}).  nsAsyncRedirectVerifyHelper::Init will veto this off the bat with NS_BINDING_ABORTED.  (At least for non-internal and non-STS upgrades, but we're explicitly dealing with redirects from content here.)

The root problem is that FetchDriver::OnStopRequest assumes that a vetoed redirect will result in NS_FAILED(aStatusCode).  Under ContinueHandleAsyncRedirect, this is not the case for non-"follow" redirect modes.  It only sets mStatus before calling DoNotifyListener (which triggers OnStartRequest and OnStopRequest) if !mLoadInfo->GetDontFollowRedirect()[2].

In this case where mStatus==NS_OK, when OnStartRequest is invoked, it ends up bailing in the redirect="error" mode without creating an mResponse because of the special-case check logic:
    if (mozilla::net::nsHttpChannel::IsRedirectStatus(responseStatus)) {
      if (mRequest->GetRedirectMode() == RequestRedirect::Error) {
        FailWithNetworkError();
        return NS_BINDING_FAILED;
      }
"manual" turns out okay because it continues to create the response.

This leads to the assertion failure/crash in OnStopRequest.

It's not immediately clear the best way to clean up OnStopRequest.  The "error" redirect mode is the only invocation of FailWithNetworkError() with an early return from OnStartRequest that is expected/in-band; all the others are exceptional.  Since it doesn't result in mPipeOutputStream being populated, the NS_FAILED OnStopRequest branch is meaningless for it.  (And the if(mObserver) branch won't be taken, nor will it for any of the others because FailWithNetworkError will have already voided it.)

I think I'll run with treating !mObserver as indicating a failure has already been reported and processed and so the NS_FAILED branch should be taken.  I think this makes sense given that FailWithNetworkError() nulls mObserver out after triggering OnResponseEnd, and the only other place OnResponseEnd is invoked or mObserver is nulled out is in the remainder of OnStopRequest that still assumes mResponse.


1: The stack shows the input pump being responsible for triggering the error.  This suggests that ContinueProcessResponse3 could have continued through to ProcessNormal, losing the rv.  It makes sense that this could then pump the input.  However, the problem is in FetchDriver[3], so I'm not sure it's necessary to reproduce the other failure case.

2: https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#827

3: The semantics of nsHttpChannel may be debatable, but given the explicit comment of "TODO: stop failing original channel if redirect vetoed?" in ContinueHandleAsyncRedirect, and the multiple paths that don't set mStatus to the NS_BINDING_ABORTED value, it does seem like FetchDriver is in the wrong.
It's worth noting that the comment 0 crash only happens in non-e10s mode.  The reason is that HttpBaseChannel's DoNotifyListener invocation of OnStartRequest looks like this:

    nsCOMPtr<nsIStreamListener> listener = mListener;
    listener->OnStartRequest(this, mListenerContext);

Whereas HttpChannelChild's invocation looks like this:

  nsresult rv = mListener->OnStartRequest(aRequest, aContext);
  if (NS_FAILED(rv)) {
    Cancel(rv);
    return;
  }

Which matters because as excerpted in comment 2, FetchDriver::OnStartRequest's error redirect logic is:

      if (mRequest->GetRedirectMode() == RequestRedirect::Error) {
        FailWithNetworkError();
        return NS_BINDING_FAILED;
      }

And HttpChannelChild::Cancel will propagate that returned NS_BINDING_FAILED to be mStatus, which will be passed to FetchDriver::OnStopRequest.

This does suggest that perhaps HttpBaseChannel::DoNotifyListener should not be ignoring the rv of its OnStartRequest invocation.
The included test crashes without the included fix if run with --disable-e10s.
e10s doesn't crash because HttpChannelChild examines the return value of the
call to the listener's OnStartRequest method and invokes Cancel() if it
NS_FAILED.  This is a divergence between e10s and non-e10s.  See the bug for
more details and discussion.
Attachment #8879936 - Flags: review?(bkelly)
Comment on attachment 8879936 [details] [diff] [review]
Fetch needs to handle redirect=error not resulting in NS_FAILED

Review of attachment 8879936 [details] [diff] [review]:
-----------------------------------------------------------------

Do we really not have test coverage of error redirect modes in WPT?  Seems like it should be there.

Thanks for fixing this!
Attachment #8879936 - Flags: review?(bkelly) → review+
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #5)
> Do we really not have test coverage of error redirect modes in WPT?  Seems
> like it should be there.

Under testing/web-platform/tests/fetch/api/redirect there's:
- redirect-mode.js
- redirect-location.js
They explicitly test error mode and that it rejects.  (It also looks like there's some other coverage in places like api/request/request-init*.sub.html.)  However, they use fetch/api/resources/redirect.py which explicitly sets "Cache-Control: no-cache" and as such they are incapable of reproducing the comment 0 crash scenario.  It's not clear they would benefit from including cache variations as currently written.

There is, however, a testing/web-platform/tests/fetch/http-cache directory.  I could see adding tests along the line of:
- fetch /redirects-to-foo that indeed redirects to /foo with redirect=follow
- fetch /redirects-to-foo with redirect=manual, ensuring that we get the redirect and not the contents of /foo
- fetch /redirects-to-foo with redirect=error, ensuring that we get an error and not the contents of /foo

Such a test would trigger the error path and wouldn't be blatantly checking a Gecko glass-box test into WPT.  (I wouldn't use those specific paths, of course, but instead augment resources/http-cache.py.)
(In reply to Andrew Sutherland [:asuth] from comment #6)
> There is, however, a testing/web-platform/tests/fetch/http-cache directory. 
> I could see adding tests along the line of:
> - fetch /redirects-to-foo that indeed redirects to /foo with redirect=follow
> - fetch /redirects-to-foo with redirect=manual, ensuring that we get the
> redirect and not the contents of /foo
> - fetch /redirects-to-foo with redirect=error, ensuring that we get an error
> and not the contents of /foo

If you have time, that would be great.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5edfe512460d3d2f5f9577e7f76c4361bc9f7292
Bug 1374943 - Fetch needs to handle redirect=error not resulting in NS_FAILED. r=bkelly
I did some additional non-hacky WPT test investigation.  Per fetch/http-cache's README.md it's specifically scoped to http://httpwg.org/specs/rfc7234.html tests and they all have clear rationales.  The RFC has effectively no discussions of redirect cache implications, which makes my proposed test feel like shoe-horning.  And the aforementioned fetch/api/redirect tests are all basically permutation tests where caching has little relevance thus far, other than our bug here.

So I stuck with just the mochitest for landing.
https://hg.mozilla.org/mozilla-central/rev/5edfe512460d
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Only a handful of ESR52 crashes in the last 3 months makes me think this isn't worth the uplift there. Should we consider this for uplift to Beta given where we are in the cycle (one Beta left later this week) or should we let it ride the 56 train at this point? Patch looks simple anyway.
Flags: needinfo?(bugmail)
Flags: in-testsuite+
I think it's sufficiently safe to uplift, and perhaps advisable.  Should I set a flag?

The crashes are likely rare because it's a combination of several things:
- Use of the fetch() API, which is growing.
- fetch()ing a redirect, pretty rare, likely to continue to be rare.
- fetch()ing a redirect with non-"follow" redirect mode, this is non-default, and likely to be rare.
- That redirect had to be cached.

It's notable that none of these are random, so a mean person could use this to crash Firefox, so there is arguably a clear benefit to uplift.
Flags: needinfo?(bugmail)
Comment on attachment 8879936 [details] [diff] [review]
Fetch needs to handle redirect=error not resulting in NS_FAILED

Approval Request Comment
[Feature/Bug causing the regression]:
Not a regression, just a longstanding edge case.

[User impact if declined]:
A fetch() of a cached redirect with the fetch operating in a non-"follow" mode will reliably cause a crash in non-e10s mode, which crashes the whole browser, by definition.

This is currently a rare idiom, but because of the involvement of cache, even someone testing on Firefox may not notice the problem.

Note that there may be another failure mode that could be more common; there is a second category of crashes that I did not analyze because the fix was the same for both.

[Is this code covered by automated tests?]:
Yes, included in the patch.

[Has the fix been verified in Nightly?]:
By the test, yes, manually, no.

[Needs manual test from QE? If yes, steps to reproduce]: 
No, it's a straightforward crash scenario.

[List of other uplifts needed for the feature/fix]:
No other uplifts required.

[Is the change risky?]:
No, it's adding a single check that enforces an otherwise-existing invariant that was subject to extensive analysis.

[Why is the change risky/not risky?]:
Extensive analysis, test added with this patch.  Additionally, web platform tests provide us with quite excellent coverage of our general, non-cache related functionality that help ensure we're not regressing our correctness.

[String changes made/needed]:
None.
Attachment #8879936 - Flags: approval-mozilla-beta?
Comment on attachment 8879936 [details] [diff] [review]
Fetch needs to handle redirect=error not resulting in NS_FAILED

fix an edge case for fetch, beta55+
Attachment #8879936 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Andrew Sutherland [:asuth] from comment #13)
> [Is this code covered by automated tests?]:
> Yes, included in the patch.
> 
> [Has the fix been verified in Nightly?]:
> By the test, yes, manually, no.
> 
> [Needs manual test from QE? If yes, steps to reproduce]: 
> No, it's a straightforward crash scenario.

Setting qe-verify- based on Andrew's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.