Open Bug 1757802 Opened 2 years ago Updated 7 months ago

Consider no longer directly associating Shmem instances with actors

Categories

(Core :: IPC, enhancement, P2)

enhancement

Tracking

()

REOPENED

People

(Reporter: nika, Assigned: nika)

References

(Regressed 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

With this new approach, Shmem instances will now have their handles transferred inline within messages as attachments, rather than being associated with their actors and sent in separate messages.

This has a few advantages:

  • The implementation is much simpler
  • Releasing all references to a Shmem will automatically destroy it by RAII, rather than leaking the shared memory region until the toplevel actor is destroyed, removing the need for types like RaiiShmem.
  • This allows re-transmitting Shmem instances to another process, as we don't close the shared memory region handle upon receiving it.

But also has a disadvantage that because we keep alive the shared memory region's handle until the shmem is destroyed, so that it can be re-transmitted, we may end up using more FDs or HANDLEs while running.

This bug intentionally doesn't change or simplify callsites, removing APIs like RaiiShmem, in order to make it easier to revert if this causes issues on platforms like Linux due to FD exhaustion. If we don't run into increased resource exhaustion problems, we can make these changes in a follow-up.

With this new approach, Shmem instances will now have their handles
transferred inline within messages as attachments, rather than being
associated with their actors and sent in separate messages.

This has a few advantages:

  • The implementation is much simpler
  • Releasing all references to a Shmem will automatically destroy it by
    RAII, rather than leaking the shared memory region until the toplevel
    actor is destroyed, removing the need for types like RaiiShmem.
  • This allows re-transmitting Shmem instances to another process, as we
    don't close the shared memory region handle upon receiving it.

But also has a disadvantage that because we keep alive the shared memory
region's handle until the shmem is destroyed, so that it can be
re-transmitted, we may end up using more FDs or HANDLEs while running.

This patch intentionally doesn't change or simplify callsites, removing
APIs like RaiiShmem, in order to make it easier to revert if this causes
issues on platforms like Linux due to FD exhaustion. If we don't run
into increased resource exhaustion problems, we can make these changes
in a follow-up.

Depends on D140009

Pushed by nlayzell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0d23532d6a49
Don't keep alive Shmem shared memory regions on IProtocol, r=ipc-reviewers,jld
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch
Regressions: 1765299
Regressions: 1765776
Regressions: 1766003
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 101 Branch → ---
Flags: needinfo?(nika)
No longer regressions: 1765393

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:nika, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(nika)
Flags: needinfo?(jld)

There was a regression from the change and we still don't have any idea as to what caused it, so there's not a ton of immediate action we can take.

Flags: needinfo?(nika)

Err actually nevermind, wrong patch. This is blocked on bug 1767514 (and potentially other changes) for now.

Depends on: 1767514
No longer regressions: 1765480
See Also: → 1765480

We still need the ffmpeg update to land something like this, and we may want to also be a bit more clever around the media code etc.

Flags: needinfo?(jld)

(also around graphics potentially)

Depends on: 1765480
See Also: 1765480

Nika, I've updated our vendored ffmepg copy to a recent version, containing the bug fix that was needed for this to not crash, maybe we can reland this?

Flags: needinfo?(nika)
Severity: -- → S2

We're slightly concerned about potential performance issues which might occur due to GMP code expecting the existing behaviour's performance characteristics, so we've been holding off on landing something like this directly. Instead, we've been adding some new Shmem types like BigBuffer or the flexible API for unsafe shred memory from bug 1795311 which are hopefully better suited for the existing uses of Shmem. We might want to then refactor the code out of the core IPC layer, and make it something which GMP deals with directly instead.

Flags: needinfo?(nika)

Nika, it sounds like this bug itself is WONTFIX and it may need a separate bug for the actions you describe?

Flags: needinfo?(nika)

Unlike the previous approach taken in this bug, this does not fully
decouple the Shmem type from the IPDL actor it is sent over, but
rather makes the binding between the Shmem and the actor lazy, so that
it is performed when the type is first serialized.

This should help avoid issues with code needing to track a specific
actor through interfaces like IShmemAllocator in order to allocate
shared memory regions, without adding extra overhead to code which is
taking advantage of the existing behaviour of sending Shmem regions back
and forth using integer IDs.

For now, like with the previous patch, this does not change the API used
by any callers, and only changes the implementation under the hood. The
new implementation uses the new RawShmem.h APIs, rather than the older
SharedMemory.h APIs, even though those APIs are currently still built on
the older APIs, as they are likely to be more efficient in the future.

This new implementation is also a bit less aggressive about marking
shared memory regions as inaccessible in processes which don't call
DeallocShmem. This was always debug-only, and doesn't appear to provide
much benefit, other than requiring extra overhead/coordination between
threads/processes. This may help avoid some rare edge-case crashes with
Shmem regions which only occur in debug builds (e.g. due to an actor
unexpectedly dying while code on another thread is reading a Shmem
region).

A Shmem region which is destroyed before being sent will be
automatically cleaned up after these changes, but code still needs to
explicitly call Dealloc on any Shmems which are transferred over IPC,
just like they do without the patch. I spent some time investigating
doing a form of automatic collection across processes, however decided
it would be impractical due to IPC messages being dropped without being
deserialized when sent to destroyed managed actors, meaning that
transferred references within IPC messages could be easily leaked.

It may be possible to do something simpler which only does the reference
counting within a single process for non-unsafe shmems, however that is
not implemented here, and likely wouldn't work for unsafe shmem regions.
It may be worth investigating if in the future we are able to migrate
unsafe shmem to use other mechanisms like those in RawShmem.

Attachment #9266291 - Attachment is obsolete: true

(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)

Nika, it sounds like this bug itself is WONTFIX and it may need a separate bug for the actions you describe?

I've put up some patches for a new approach for this bug, which I hope will help improve the UX of Shmem substantially while not having the same potential perf pitfalls of the previous approach.

Flags: needinfo?(nika)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: