Consider no longer directly associating Shmem instances with actors
Categories
(Core :: IPC, enhancement, P2)
Tracking
()
People
(Reporter: nika, Assigned: nika)
References
Details
Attachments
(1 file, 1 obsolete file)
With this new approach, Shmem
instances will now have their handles transferred inline within messages as attachments, rather than being associated with their actors and sent in separate messages.
This has a few advantages:
- The implementation is much simpler
- Releasing all references to a
Shmem
will automatically destroy it by RAII, rather than leaking the shared memory region until the toplevel actor is destroyed, removing the need for types likeRaiiShmem
. - This allows re-transmitting
Shmem
instances to another process, as we don't close the shared memory region handle upon receiving it.
But also has a disadvantage that because we keep alive the shared memory region's handle until the shmem is destroyed, so that it can be re-transmitted, we may end up using more FDs or HANDLEs while running.
This bug intentionally doesn't change or simplify callsites, removing APIs like RaiiShmem
, in order to make it easier to revert if this causes issues on platforms like Linux due to FD exhaustion. If we don't run into increased resource exhaustion problems, we can make these changes in a follow-up.
Assignee | ||
Comment 1•3 years ago
|
||
With this new approach, Shmem instances will now have their handles
transferred inline within messages as attachments, rather than being
associated with their actors and sent in separate messages.
This has a few advantages:
- The implementation is much simpler
- Releasing all references to a Shmem will automatically destroy it by
RAII, rather than leaking the shared memory region until the toplevel
actor is destroyed, removing the need for types like RaiiShmem. - This allows re-transmitting Shmem instances to another process, as we
don't close the shared memory region handle upon receiving it.
But also has a disadvantage that because we keep alive the shared memory
region's handle until the shmem is destroyed, so that it can be
re-transmitted, we may end up using more FDs or HANDLEs while running.
This patch intentionally doesn't change or simplify callsites, removing
APIs like RaiiShmem, in order to make it easier to revert if this causes
issues on platforms like Linux due to FD exhaustion. If we don't run
into increased resource exhaustion problems, we can make these changes
in a follow-up.
Depends on D140009
Comment 3•3 years ago
|
||
bugherder |
Comment 4•3 years ago
|
||
Backed out as requested.
Backout link: https://hg.mozilla.org/integration/autoland/rev/88a76c0a8d1cd349786eb1610cfa2f10907d7a1c
Updated•3 years ago
|
Comment 5•3 years ago
|
||
Comment 6•3 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:nika, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 7•3 years ago
|
||
There was a regression from the change and we still don't have any idea as to what caused it, so there's not a ton of immediate action we can take.
Assignee | ||
Comment 8•3 years ago
|
||
Err actually nevermind, wrong patch. This is blocked on bug 1767514 (and potentially other changes) for now.
Updated•3 years ago
|
Assignee | ||
Comment 9•2 years ago
|
||
We still need the ffmpeg update to land something like this, and we may want to also be a bit more clever around the media code etc.
Assignee | ||
Comment 10•2 years ago
|
||
(also around graphics potentially)
Comment 11•2 years ago
|
||
Nika, I've updated our vendored ffmepg copy to a recent version, containing the bug fix that was needed for this to not crash, maybe we can reland this?
Updated•2 years ago
|
Assignee | ||
Comment 12•2 years ago
|
||
We're slightly concerned about potential performance issues which might occur due to GMP code expecting the existing behaviour's performance characteristics, so we've been holding off on landing something like this directly. Instead, we've been adding some new Shmem types like BigBuffer
or the flexible API for unsafe shred memory from bug 1795311 which are hopefully better suited for the existing uses of Shmem
. We might want to then refactor the code out of the core IPC layer, and make it something which GMP deals with directly instead.
Comment 13•2 years ago
|
||
Nika, it sounds like this bug itself is WONTFIX and it may need a separate bug for the actions you describe?
Assignee | ||
Comment 14•2 years ago
|
||
Unlike the previous approach taken in this bug, this does not fully
decouple the Shmem
type from the IPDL actor it is sent over, but
rather makes the binding between the Shmem and the actor lazy, so that
it is performed when the type is first serialized.
This should help avoid issues with code needing to track a specific
actor through interfaces like IShmemAllocator in order to allocate
shared memory regions, without adding extra overhead to code which is
taking advantage of the existing behaviour of sending Shmem regions back
and forth using integer IDs.
For now, like with the previous patch, this does not change the API used
by any callers, and only changes the implementation under the hood. The
new implementation uses the new RawShmem.h APIs, rather than the older
SharedMemory.h APIs, even though those APIs are currently still built on
the older APIs, as they are likely to be more efficient in the future.
This new implementation is also a bit less aggressive about marking
shared memory regions as inaccessible in processes which don't call
DeallocShmem. This was always debug-only, and doesn't appear to provide
much benefit, other than requiring extra overhead/coordination between
threads/processes. This may help avoid some rare edge-case crashes with
Shmem regions which only occur in debug builds (e.g. due to an actor
unexpectedly dying while code on another thread is reading a Shmem
region).
A Shmem region which is destroyed before being sent will be
automatically cleaned up after these changes, but code still needs to
explicitly call Dealloc
on any Shmems which are transferred over IPC,
just like they do without the patch. I spent some time investigating
doing a form of automatic collection across processes, however decided
it would be impractical due to IPC messages being dropped without being
deserialized when sent to destroyed managed actors, meaning that
transferred references within IPC messages could be easily leaked.
It may be possible to do something simpler which only does the reference
counting within a single process for non-unsafe shmems, however that is
not implemented here, and likely wouldn't work for unsafe shmem regions.
It may be worth investigating if in the future we are able to migrate
unsafe shmem to use other mechanisms like those in RawShmem.
Updated•2 years ago
|
Assignee | ||
Comment 15•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)
Nika, it sounds like this bug itself is WONTFIX and it may need a separate bug for the actions you describe?
I've put up some patches for a new approach for this bug, which I hope will help improve the UX of Shmem substantially while not having the same potential perf pitfalls of the previous approach.
Description
•