Closed Bug 1565405 Opened 5 years ago Closed 5 years ago

Async actor construction failures don't notify anywhere

Categories

(Core :: IPC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: mattwoodrow, Assigned: nika)

References

Details

Attachments

(1 file)

As discovered in bug 1562616, returning nullptr from an AllocPActor function (for an async ctor) doesn't cause any failures.

It looks like the generated code in OnMessageReceived just returns a MsgValueError, MaybeHandleError prints that to stderr (not in crash reports), and then the default ProcessingError() impl does nothing with it.

This means that by default, failing to construct an async actor isn't observable in any way.

Bug 1191976 made us crash the receiving process whenever we fail to deserialize a message, which includes the case where we had a reference to the actor that failed to create.

This makes it really easy to crash the parent by using an actor that wasn't actually valid, and the resulting crash report only points to the first use of it.

Should we also crash the parent when the actor fails to create, so that we get better stacks for the actual cause?

See Also: → 1562616
Flags: needinfo?(nika)

Unfortunately IPC is full of footguns like this, and I'm still trying to figure out how to handle them in the general case.

  1. We really don't want to crash when something fails to deserialize, ideally we'd just kill the content process. Unfortunately, as bug 1191976 found, that causes issues where we get useless content process crash reports with no info about what failed or why. If there was a way to annotate the content process' crash report before crashing it from the parent process with the relevant info, it would make for a much better experience all-around.

  2. Being able to send actor references over IPC is definitely something I'm still struggling with out to make non-crashy. :jld and I have been talking about requiring PFoo* actors to be sent within some sort of FallibleActor<PFoo> wrapper type (which would be implicitly-converted to from PFoo*), requiring you to unwrap past an error type. The error would be present if the actor doesn't exist in the parent process anymore, making this sort of thing not a crash unless it needs to be. Errors like the content process straight-up lying about stuff (like the type of the actor) would probably still be deserialization failures.

    This sort of fallible serialization might also be useful for deserializing dead BrowsingContext objects.

  3. Async actor ctors used to crash in that sort of situation (sometimes), but it was unreliable whether they would crash, and they could crash unpredictably (e.g. it was possible to get a racy crash during shutdown). We generally tried to move away from crashing because it produced unexpected results and was often responsible for really frustrating user crashes. Unfortunately it doesn't interact well with constructing managed actors of already-destroyed actors. Perhaps we need to make that fail more aggressively again?

  4. We should probably crash as we send a destroyed IPDL actor - wouldn't be too hard to do. Perhaps that's what we can do in this bug.

Flags: needinfo?(nika)

ni? :mattwoodrow in case he has any opinions here.

Flags: needinfo?(matt.woodrow)

The previous behaviour of failing unconditionally was performed as, during
shutdown, the channel could become unable to send without worker threads having
a chance to react. This change keeps that behaviour, isolating async message
senders from impending IPC shutdown, while performing expected actor teardown if
the manager actor has already been destroyed, and should no longer send messages.

An alternate behaviour here could be to crash if !Manager()->CanSend(). That
behaviour may be preferable if a sufficient number of callsites don't check the
return value of the SendPFooConstructor() method.

Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cae8c48da596
Fail async ctor call if `!Manager()->CanSend()`, r=froydnj
Assignee: nobody → nika
Status: NEW → ASSIGNED
Priority: -- → P2
Type: enhancement → defect

Solutions 1 and 2 both sound great, and also a lot of work.

I don't think the current patch would have helped for bug 1562616 since in that case the constructor case failed on the receiving side. We end up in a weird mismatched state where the sending side thinks it has a fully constructed actor, and the receiving side thinks that it doesn't. Next time we try to send across a reference to this zombie-actor, the receiving side crashes as it doesn't exist.

Flags: needinfo?(matt.woodrow)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: