Closed
Bug 1128503
Opened 10 years ago
Closed 7 years ago
Group ShmemTextureClient allocations of the same size into bigger shmems instead of having 1 shmem per texture
Categories
(Core :: Graphics: Layers, defect, P3)
Core
Graphics: Layers
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: nical, Assigned: nical)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Whiteboard: [gfx-noted])
Attachments
(7 files, 3 obsolete files)
9.68 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
5.89 KB,
patch
|
Details | Diff | Splinter Review | |
41.72 KB,
patch
|
Details | Diff | Splinter Review | |
2.92 KB,
patch
|
Details | Diff | Splinter Review | |
9.98 KB,
patch
|
Details | Diff | Splinter Review | |
12.85 KB,
patch
|
Details | Diff | Splinter Review | |
29.24 KB,
patch
|
Details | Diff | Splinter Review |
This keeps coming back to the discussion as we hit the file descriptor limit (often on Mac where the limit is rather low, but sometimes on other platforms too). So here is a bug to track that.
The two most common use cases are tiles and video frames where it's easy to group allocations since a lot of textures have the same size.
We could go further and have a "real" allocator for arbitrary sizes, but I think we would go most of the way with something simple that supports only allocations of a given size (one instance for tiling, and once instance per video stream).
Assignee | ||
Comment 1•9 years ago
|
||
This contains the logic for a concurrent fixed-size and cross-process allocator working on a borrowed buffer. The idea is to wrap this into a class the provides the buffer (be it shared memory or regular memory) and manages the lifetime of the buffer (will come in a followup patch).
This only has dependencies to mfbt, and I'd like to keep it that way (I am hesitating to put it in Moz2d instead of layers/ipc/ to senure that). I'll add tests for it in a subsequent patch.
Attachment #8713192 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Comment on attachment 8713192 [details] [diff] [review]
Part 1 - A simple fixed size concurrent allocator
Review of attachment 8713192 [details] [diff] [review]:
-----------------------------------------------------------------
Using release asserts works as long as this wrapping class deals with making sure the calls to the FixedClassAllocator have good arguments...
Assignee | ||
Comment 3•9 years ago
|
||
The previous patch was missing part of the initialization.
Attachment #8713192 -
Attachment is obsolete: true
Attachment #8713192 -
Flags: review?(jmuizelaar)
Attachment #8713243 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #2)
> Using release asserts works as long as this wrapping class deals with making
> sure the calls to the FixedClassAllocator have good arguments...
Yes. The release asserts I added correspond to what would be use-after-free bugs with a regular allocator so I'm not taking any risks there.
Since this will be on both the parent and child process I can change it to crash content processes and just gfxCriticalError on the parent process so that child processes can't crash the parent by sending invalid IDs (I'll propose that in a followup patch).
Assignee | ||
Comment 5•9 years ago
|
||
Fix a bug and the busted indentation.
Attachment #8713243 -
Attachment is obsolete: true
Attachment #8713243 -
Flags: review?(jmuizelaar)
Attachment #8713735 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8713736 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8713737 -
Flags: review?(jmuizelaar)
Comment 8•9 years ago
|
||
Comment on attachment 8713735 [details] [diff] [review]
Part 1 - A simple fixed size concurrent allocator
Review of attachment 8713735 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/FixedSizeAllocator.cpp
@@ +98,5 @@
> +
> +Maybe<FixedSizeAllocator::Handle>
> +FixedSizeAllocator::Allocate()
> +{
> + for (uint16_t i = 0; i < mNumChunks; ++i) {
What was the rationale for doing a search instead of maintaining a free list?
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)
> What was the rationale for doing a search instead of maintaining a free list?
Because the search is dead simple and I expect this to be used with small amounts of chunks (like 5 or 10 per allocator for video streams) and 10 or 20 for tiles (or some other compromise between number of allocators vs size per allocator).
It'll probably makes sense to change that into maintaining a free list whenever we want to have more chunks per allocator, but that will be more complicated (although not really scary either).
Comment 10•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #9)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #8)
> > What was the rationale for doing a search instead of maintaining a free list?
>
> Because the search is dead simple and I expect this to be used with small
> amounts of chunks (like 5 or 10 per allocator for video streams) and 10 or
> 20 for tiles (or some other compromise between number of allocators vs size
> per allocator).
>
> It'll probably makes sense to change that into maintaining a free list
> whenever we want to have more chunks per allocator, but that will be more
> complicated (although not really scary either).
Can you add some documentation to the patch about this?
Assignee | ||
Comment 11•9 years ago
|
||
Memory reporting coming in a followup patch.
Most of the complexity here is the IPDL book keeping and the need (for media) to be able to use the allocator concurrently from any thread (so there's some proxying à la TextureClient going on).
Because of bug 1208226 (mapping shared memory can fail on the compositor side), I added two ways to query the state of the allocator:
- Maybe<bool> TryIsValid() will quickly return the status if we know it.
- bool CheckIsValid() will synchronously query the status of the other process if it needs to, so it's best to avoid using it if we can.
It's not yet integrated with TextureClient (follwup patch in the making).
Attachment #8716313 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #10)
> Can you add some documentation to the patch about this?
Added:
/// We use a linear search rather than a free list because the implementation
/// is very simple and we expect to be using these allocators with small
/// amounts of chunks (about 10 to 20 chunks per allocator for media elements
/// and tiles). If we ever need a more elaborate allocation scheme we'll have
/// to add a separate allocator or modify this one.
Comment 13•9 years ago
|
||
I neglected this patch review last week. We've been looking at dealing with the file handle exhaustion problem directly with the patch that Lee has in bug 1245241.
Comment 14•9 years ago
|
||
Comment on attachment 8713735 [details] [diff] [review]
Part 1 - A simple fixed size concurrent allocator
Review of attachment 8713735 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/FixedSizeAllocator.cpp
@@ +28,5 @@
> +struct ChunkDescriptor {
> + // mGeneration can only be writen to by FixedSizeAllocator::Deallocate *before*
> + // setting mState to false so that other threads can read the up to date generation
> + // once they have acquired to chunk (my successfully setting mState to true).
> + Atomic<uint32_t> mGeneration;
I don't think mGeneration needs to be atomic. It's protected by mState. (i.e. all writes to it happen after exclusive access to the block has been obtained.
Adding a comment about how mGeneration is used for detecting use-after-free scenarios would be nice.
Attachment #8713735 -
Flags: review?(jmuizelaar) → review+
Comment 15•9 years ago
|
||
Comment on attachment 8713736 [details] [diff] [review]
Part 2 - Use a magic pattern to check that the allocator is not created around random memory
Review of attachment 8713736 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/FixedSizeAllocator.cpp
@@ +16,5 @@
> // we use 8 words alignment.
>
> +static const char * const sFixedSizeAllocMagicPattern =
> + " Shmurf from this simple shmurf allocator and shmurf it safely on another process ";
> +#define MAGIC_PATTERN_SIZE 80
static const char[] sFixedSizeAllocMagicPattern = " Shmurf from this simple shmurf allocator and shmurf it safely on another process "
You don't need a a pointer to this string just using the address of the string is fine. Then you can use something like:
#define MAGIC_PATTERN_SIZE = sizeof(sFixedSizeAllocMagicPattern)
Attachment #8713736 -
Flags: review?(jmuizelaar) → review-
Assignee | ||
Comment 16•9 years ago
|
||
Review comments addressed.
Attachment #8713736 -
Attachment is obsolete: true
Attachment #8722984 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 17•9 years ago
|
||
Attachment #8731353 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 18•9 years ago
|
||
Attachment #8731355 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 19•9 years ago
|
||
Assignee | ||
Comment 20•9 years ago
|
||
I still need to tweak things a bit, figure out a proper default shmem heap size, maybe add a pref for it, look at whether and how much fragmentation is a problem, alter the allocator's logic accordingly. I also need to add some memory reporting, integrate the allocator into ImageBridge and, add ways to check that the shmems got mapped successfully on the parent side.
With this approach we only have an allocator on the client side, so we don't need an actor pair which works around a problem I've been running into when setting up the previous approach.
I am not super enthousiastic about how the integration (Part 3) turned out, I ended up storing both the allocator and the forwarder in some places, I haven't decided yet whether that's a good or a bad thing.
I definitely want to avoid having a singleton allocator, or even an allocator per forwarder because I want to be able to specify different allocators for different types of content (tiles, video frames, arbitrarily size layers, etc) and even allocators per source of content (for video).
Jeff, when you have time to look into this, look at the "alternative approach" patches first
Comment 21•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #20)
> Jeff, when you have time to look into this, look at the "alternative
> approach" patches first
I'm at GDC this week. I expect I'll spend most of next week week working through my review queue.
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> I'm at GDC this week. I expect I'll spend most of next week week working
> through my review queue.
No problem, enjoy the conference!
Assignee | ||
Updated•9 years ago
|
Attachment #8713737 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•9 years ago
|
Attachment #8716313 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•9 years ago
|
Attachment #8722984 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 23•9 years ago
|
||
I am playing with different settings for the allocator to get a feel of the overhead we get when using it for all PCompositorBridge shmem allocations.
I am setting up the allocator heap size to be big enough for 10 tiles, with a separate allocator for non-tiled layers. So far it's a clear win in terms of number of actual allocations which may have a noticeable performance impact (I haven't actually measured that), but the memory overhead is quite large after heavy scrolling. The current scheme works well while we need a lot of tiles, but once we get rid of the times because there is less scrolling/pating going on, we end up being bad a freeing shmem heaps, and the allocator which was mostly full during scrolling ends up struggling to get rid of its not-completely-empty heaps, looking something like:
> [...#.##...]
> [.....#####]
> [.........#]
> [#.########]
> [##...#####]
> [##########]
> [#######.##]
> [#####.####]
> [##########]
> [######....]
Where # is an occupied slot and . is an available slot, and each line is a heap (1 shmem) in the allocator (each tile texture occupies exactly one slot).
Unless there is some heavy scrolling again, tile allocations tend to remain stable and the holes stay.
It'd be nice to be able to guess which tiles/textures are going to live longer than the others but the way tiling works makes it difficult. We'd probably have much better results if the tiling code would would not try so hard to reuse tiles, not only through the tile pool but also through the TiledLayerBuffer logic. So there's still some knobs to tweak there (disabling the tile pool is not enough).
Reducing the heap size helps with the pathological cases, I am currently toying with that, and way to more intelligently use
The allocator for non-tiled content is using far less memory (we just allocate a lot more tiles than the rest as far as I can see), so the memory overhead is less worrying there so far.
I also tried to use one allocator for everything (tiled and non-tiled content) but that made things worse, although some more clever scheme could make a difference.
I haven't tried to use the allocator for video frames yet but I expect things be easier because we can use one heap per stream with a size that corresponds to the amount of frames we buffer (10 I think). The heap should be mostly full during playback and quickly empty when the video is removed.
Long story short, there's some tuning to do for this to be worth it.
Updated•8 years ago
|
Priority: -- → P3
Assignee | ||
Comment 24•7 years ago
|
||
I don't think this is worth pursuing for now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•7 years ago
|
Attachment #8731353 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8731355 -
Flags: review?(jmuizelaar)
You need to log in
before you can comment on or make changes to this bug.
Description
•