Handling IPC transport failures more gracefully

RESOLVED FIXED in Firefox 48

Status

()

defect
P1
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: cyu, Assigned: bobowen)

Tracking

unspecified
mozilla49
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+, firefox46 wontfix, firefox47 wontfix, firefox48 fixed, firefox49 fixed)

Details

Attachments

(1 attachment)

We need to handle IPC transport failures more gracefully (e.g. bug 1263951 and 1258317) and to not just assert that we'll always succeed in system calls with the transports.

Updated

3 years ago
No longer blocks: 1258317
Depends on: 1258317
Assignee

Comment 1

3 years ago
I was chatting with Gabor on IRC about how this works in the sandboxed case and we realised that we could probably simply get the target process to attempt the duplication as a fall back.

This is a bit of a sticking plaster really, but it might well help.
A better fix would be trying to rework the Transport setup so that the pipes are opened in the correct process, but there is this comment at [1] that we'd need to investigate before attempting that.

You can test this patch, by simulating the failure with an addition of the following line after the INVALID_HANDLE_VALUE check in TransferHandleToProcess.

  if (XRE_IsContentProcess()) return nullptr;

[1] https://dxr.mozilla.org/mozilla-central/rev/f3f2fa1d7eed5a8262f6401ef18ff8117a3ce43e/ipc/glue/Transport_win.cpp#34

MozReview-Commit-ID: 5CKrQdDxGrg
Assignee

Updated

3 years ago
Assignee: nobody → bobowen.code
Status: NEW → ASSIGNED
Assignee

Updated

3 years ago
Attachment #8753855 - Flags: review?(gkrizsanits)
Attachment #8753855 - Flags: feedback?(cyu)
Comment on attachment 8753855 [details] [diff] [review]
If a child process fails to duplicate a TransportDescriptor pipe handle then send it anyway and get the target to duplicate

Review of attachment 8753855 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks again for this patch! r=me with a try run with the ' if (XRE_IsContentProcess()) return nullptr; ' trick mentioned above.

::: ipc/glue/Transport_win.h
@@ +41,5 @@
>      HANDLE pipe = mozilla::ipc::TransferHandleToProcess(aParam.mServerPipeHandle,
>                                                          aParam.mDestinationProcessId);
> +    DWORD duplicateFromProcessId = 0;
> +    if (!pipe) {
> +      if (XRE_IsParentProcess()) {

This case should be handled in the read function, and we should crash the child.
Attachment #8753855 - Flags: review?(gkrizsanits) → review+
Comment on attachment 8753855 [details] [diff] [review]
If a child process fails to duplicate a TransportDescriptor pipe handle then send it anyway and get the target to duplicate

Looks good to me.
Attachment #8753855 - Flags: feedback?(cyu) → feedback+
Assignee

Comment 4

3 years ago
Thanks both.

Here's a try push with the crash in the Read added (on parent duplication failure), nsAutoHandle and forcing TransferHandleToProcess to fail:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=191645eb36f4135d3779fd96c3f84a36d2c48b8f

I think all of these failures so far are to do with AWS testing problems, but it's pretty painful to work out if there are any real problems.
(In reply to Bob Owen (:bobowen) from comment #4)
> I think all of these failures so far are to do with AWS testing problems,
> but it's pretty painful to work out if there are any real problems.

Since this patch is only trying to fix something when we would crash anyway, I'm not too concerned about a few failing tests. I just wanted to see a try run for sanity check. The tests I checked from these failures are running in non-e10s mode anyway, so I'm quite sure they are unrelated.
Bob, do you think we can uplift this to beta? I'd love to see if it fixes the crash and I don't think we're risking too much with it.
Flags: needinfo?(bobowen.code)
Assignee

Comment 8

3 years ago
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #7)
> Bob, do you think we can uplift this to beta? I'd love to see if it fixes
> the crash and I don't think we're risking too much with it.

Yes, I was going to request for Beta, I also don't think there is much risk.
Just waiting for it to get merged to m-c, it landed on Saturda.
I couldn't land it Friday as inbound was closed.
Flags: needinfo?(bobowen.code)

Comment 9

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8819c74b610e
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee

Comment 10

3 years ago
Comment on attachment 8753855 [details] [diff] [review]
If a child process fails to duplicate a TransportDescriptor pipe handle then send it anyway and get the target to duplicate

Note to sheriff: a slightly different version of this patch actually landed.

Approval Request Comment
[Feature/regressing bug #]:
The release assert was introduced in bug 1223240, although the previous code would probably have failed anyway, possibly also crashing.

[User impact if declined]:
We won't potentially fix the content crash in bug 1258317.

[Describe test coverage new/current, TreeHerder]:
Existing Windows TreeHerder e10s tests that run a content process, all call this code.
A try push was done (linked in comment 4) that simulated the previous failure to force it to always test the new recovery code path.

[Risks and why]:
Low: minor changes to existing code, with a new code path to attempt recovery instead of crashing.

[String/UUID change made/needed]:
None
Attachment #8753855 - Flags: approval-mozilla-beta?
Attachment #8753855 - Flags: approval-mozilla-aurora?
Comment on attachment 8753855 [details] [diff] [review]
If a child process fails to duplicate a TransportDescriptor pipe handle then send it anyway and get the target to duplicate

At this point, I would like to only accept uplifts for critical recent regressions, severe stability, security, perf, mlk issues only. This is to minimize code churn and to ensure we ship a high-quality Fx47 in 2 weeks.

Since this is e10s related, it does not meet the Beta47 uplift bar.
Attachment #8753855 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Comment on attachment 8753855 [details] [diff] [review]
If a child process fails to duplicate a TransportDescriptor pipe handle then send it anyway and get the target to duplicate

Fixes an e10s crash, Aurora48+
Attachment #8753855 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.