Closed Bug 1606901 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::net::PDocumentChannelParent::SendRedirectToRealChannel]

Categories

(Core :: Networking, defect, P1)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- unaffected
firefox73 blocking verified disabled
firefox74 + fixed

People

(Reporter: marcia, Assigned: jya)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [necko-triaged])

Crash Data

Attachments

(3 files)

This bug is for crash report bp-fc264654-c85a-44d6-b13e-789fb0200103.

Seen while looking at nightly crashes: https://bit.ly/2SRxCU2. Fairly low volume Windows crash which started in 20191205215330

All crashes have MOZ_RELEASE_ASSERT((aVar).get_PParentToChildStreamParent()) (NULL actor value passed to non-nullable param)

Possible regression range based on build id: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6989fcd6bab30c909411fbec04a4f29a78024ddd&tochange=ed92ceb9cb68d134de79f7e71aa70882e5a91f31

There are two IPC related landings in that changeset.

Top 10 frames of crashing thread:

0 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::ipc::IPCRemoteStreamType>::Write ipc/ipdl/InputStreamParams.cpp:898
1 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::ipc::IPCRemoteStreamParams>::Write ipc/ipdl/InputStreamParams.cpp:1096
2 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::ipc::InputStreamParams>::Write ipc/ipdl/InputStreamParams.cpp:2030
3 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::ipc::IPCStream>::Write ipc/ipdl/IPCStream.cpp:45
4 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::Maybe<mozilla::ipc::IPCStream> >::Write ipc/glue/IPDLParamTraits.h:250
5 xul.dll static mozilla::ipc::IPDLParamTraits<nsIInputStream*>::Write ipc/glue/IPCStreamUtils.cpp:537
6 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::dom::ReplacementChannelConfigInit>::Write ipc/ipdl/DOMTypes.cpp:2856
7 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::Maybe<mozilla::dom::ReplacementChannelConfigInit> >::Write ipc/glue/IPDLParamTraits.h:250
8 xul.dll static mozilla::ipc::IPDLParamTraits<mozilla::net::RedirectToRealChannelArgs>::Write ipc/ipdl/NeckoChannelParams.cpp:4942
9 xul.dll mozilla::net::PDocumentChannelParent::SendRedirectToRealChannel ipc/ipdl/PDocumentChannelParent.cpp:136

Adding a ni on jed to perhaps identify what might have regressed or if this assert is anything to be concerned about.

Flags: needinfo?(jld)

Looks like there are two problems here:

  1. A missing null check, probably here, in SerializeInputStreamAsPipeInternal. There are several error cases in the function it calls that return nullptr.
  2. Not checking for forbidden nulls when constructing the IPDL-generated type, only when serializing it.
Flags: needinfo?(jld)

do we know what we want to do with this? volume looks questionably high on beta.

Flags: needinfo?(gpascutto)

Looks like part 1 of comment 2 is a trivial fix, so let's take this now.

Flags: needinfo?(gpascutto)
Priority: -- → P1

This is showing up as one of the top crashes on Beta73 so far.

Jed are you taking this one?

Flags: needinfo?(jld)

I don't know this code. :baku?

(I'm not sure where this bug even belongs. Maybe Necko, because DocumentChannel seems to be involved?)

Component: IPC → Networking
Flags: needinfo?(jld) → needinfo?(amarchesini)

Jean-Yves or Matt might have thoughts here. Core: Networking is the component we've been using for DocumentChannel (bug 1556493).

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jyavenard)
Flags: needinfo?(amarchesini)
Severity: normal → critical

This needs an immediate attention.

Assignee: nobody → matt.woodrow
Whiteboard: [necko-triaged]
Assignee: matt.woodrow → jyavenard
Flags: needinfo?(jyavenard)
Flags: needinfo?(matt.woodrow)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7aee28f987c3
Test that channel is still opened before continuing. r=mattwoodrow
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(jyavenard)

Comment on attachment 9121447 [details]
Bug 1606901 - Test that channel is still opened before continuing. r?mattwoodrow

Beta/Release Uplift Approval Request

  • User impact if declined: parent will crash if for whatever reason a child process got killed
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We just check if the child is still alive before calling a method.

Been in Nightly for a few days, crashes have stopped

  • String changes made/needed: none
Flags: needinfo?(jyavenard)
Attachment #9121447 - Flags: approval-mozilla-beta?

Comment on attachment 9121447 [details]
Bug 1606901 - Test that channel is still opened before continuing. r?mattwoodrow

Fixes an Fx73 topcrash. Approved for 73.0b8.

Attachment #9121447 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

these crashes seem to be continuing in 73.0b8 unfortunately - do you rather want to reopen this bug or should i get a new one on file?

Flags: needinfo?(jyavenard)

Let's reopen.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla74 → ---

Poked at this a bit. Like what :jld said, this appears to be being caused, in part, by a lack of error checking when serializing an nsIInputStream. Specifically, it appears that the call to IPCStreamSource::Create is not checked for errors (https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/ipc/glue/InputStreamUtils.cpp#118). This lack of checking causes us to end up with an invalid IPCStream object to serialize, due to nullptr being present when it isn't allowed, which causes the crash in this bug.

This method will return nullptr if the stream which is being sent is not marked as non-blocking (or some other less-likely scenarios), so I'm guessing that our uploadStream is marked as blocking when we're trying to serialize it.

We could add checking here, and fail the input stream serialization, but I believe that would lead to a different crash, as the code is using nsIInputStream directly as a type, and the serializer for that type assumes serialization is infallible: https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/ipc/glue/IPCStreamUtils.cpp#535

I'm not entirely certain I understand why uploadStream is being sent down on the new child channel at all, but we should probably not try to change that in an uplifted patch. The easiest solution right now would probably be to switch to using the fallible IPCStream serialization methods, add the null check in the stream serialization codepath, and not attach any upload stream if it fails.

Due to the ongoing crashes in 73.0b8, Relman has asked us to disable DocumentChannel in 73.0b9. That is bug 1610888.

Depends on: 1610888

(In reply to :Nika Layzell (ni? for response) from comment #19)

I'm not entirely certain I understand why uploadStream is being sent down on the new child channel at all, but we should probably not try to change that in an uplifted patch. The easiest solution right now would probably be to switch to using the fallible IPCStream serialization methods, add the null check in the stream serialization codepath, and not attach any upload stream if it fails.

We need the upload stream on the final channel in the content process, because we use it when adding to session history.

In the process switch case, this means that we need to serialize it from the parent. The same-process case could instead get a copy from the DocumentChannelChild, but we haven't done this as we wanted to have the two code paths be as similar as possible (but could revisit that decision).

We also might want to change the way we add to session history for loads (and do it from the parent), which would fix this for all cases.

(In reply to Matt Woodrow (:mattwoodrow) from comment #21)

We also might want to change the way we add to session history for loads (and do it from the parent), which would fix this for all cases.

Yeah, doing that change would likely be part of the async session history changes we have planned.

I'm surprised we somehow have a blocking upload stream in the parent process, as theoretically we should've received it from the content process, meaning (I'm hoping), that it's guaranteed to be async. Is it possible for a channel to gain an upload stream due to a redirect or something like that?

(In reply to :Nika Layzell (ni? for response) from comment #22)

I'm surprised we somehow have a blocking upload stream in the parent process, as theoretically we should've received it from the content process, meaning (I'm hoping), that it's guaranteed to be async. Is it possible for a channel to gain an upload stream due to a redirect or something like that?

I haven't been able to find any ways to get a blocking input stream! There's code like HttpBaseChannel::EnsureUploadStreamIsCloneable which can mutate it, but still seems like that always gets a non-blocking upload stream.

No crashes in 73.0b9 and up, marking as disabled for that release.

Still no understanding on why this could actually fail.
IPCStreamSource::Create could only fail under two circumstances prior the first fix in bug 1606901.
Either the ipc channel was closed and aManager->SendPParentToChildStreamConstructor(source) would have returned nullptr.
Or IPCStreamSourceParent::Create(aInputStream) returned null.

The original fix prevented the first condition from occurring. The second can only occur if IPCStreamSourceParent::Initialize failed, which happens should the stream be blocking.

It is still unknow on how the uploadStream sets here (used for POST) could be blocking.

For now, we will just prevent the issue from occurring.

Attachment #9123816 - Attachment description: Bug 1606901 - Abort should we failed at creating IPCStreamSource. r?mattwoodrow → Bug 1606901 - P1. Abort should we failed at creating IPCStreamSource. r?mattwoodrow

From the documentation:
" While this interface is almost exclusively used with non-blocking streams, it is not necessary that nsIInputStream::isNonBlocking return true"

So we can't assume that a nsIAsyncInputStream is always non-blocking. Handle the case where it is.

Depends on D61367

Flags: needinfo?(jyavenard)
Attachment #9123818 - Attachment description: Bug 1606901 - P2. Don't assume that nsIAsyncInputStream are always non-blocking. r?baku → Bug 1606901 - P1. Don't assume that nsIAsyncInputStream are always non-blocking. r?baku
Attachment #9123816 - Attachment description: Bug 1606901 - P1. Abort should we failed at creating IPCStreamSource. r?mattwoodrow → Bug 1606901 - P2. Abort should we failed at creating IPCStreamSource. r?mattwoodrow
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/57f2c7a63fcf
P1. Don't assume that nsIAsyncInputStream are always non-blocking. r=baku

Comment on attachment 9121447 [details]
Bug 1606901 - Test that channel is still opened before continuing. r?mattwoodrow

Removing the Beta approval on this to get it off the needs-uplift radar.

Attachment #9121447 - Flags: approval-mozilla-beta+
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/09db01447338
P2. Abort should we failed at creating IPCStreamSource. r=mattwoodrow

Please specify a root cause for this bug. See :tmaity for more information.

Root Cause: --- → ?

NI for RCA

Flags: needinfo?(jyavenard)
Root Cause: ? → Coding: Unhandled Exceptions
Flags: needinfo?(jyavenard)
You need to log in before you can comment on or make changes to this bug.