Closed Bug 1128503 Opened 10 years ago Closed 7 years ago

Group ShmemTextureClient allocations of the same size into bigger shmems instead of having 1 shmem per texture

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: nical, Assigned: nical)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

Attachments

(7 files, 3 obsolete files)

Part 1 - A simple fixed size concurrent allocator 9 years ago Nicolas Silva [:nical] 8.50 KB, patch		Details \| Diff \| Splinter Review
Part 1 - A simple fixed size concurrent allocator 9 years ago Nicolas Silva [:nical] 8.68 KB, patch		Details \| Diff \| Splinter Review
Part 1 - A simple fixed size concurrent allocator 9 years ago Nicolas Silva [:nical] 9.68 KB, patch	jrmuizel : review+	Details \| Diff \| Splinter Review
Part 2 - Use a magic pattern to check that the allocator is not created around random memory 9 years ago Nicolas Silva [:nical] 2.90 KB, patch	jrmuizel : review-	Details \| Diff \| Splinter Review
Part 3 - gtests 9 years ago Nicolas Silva [:nical] 5.89 KB, patch		Details \| Diff \| Splinter Review
Part 4 - Cross process allocator 9 years ago Nicolas Silva [:nical] 41.72 KB, patch		Details \| Diff \| Splinter Review
Part 2 - Use a magic pattern to check that the allocator is not created around random memory 9 years ago Nicolas Silva [:nical] 2.92 KB, patch		Details \| Diff \| Splinter Review
Alternative approach - Part 1 - A Simple ShmemSectionAllocator 9 years ago Nicolas Silva [:nical] 9.98 KB, patch		Details \| Diff \| Splinter Review
Alternative approach - Part 2 - BufferTextureClient/Host implementation 9 years ago Nicolas Silva [:nical] 12.85 KB, patch		Details \| Diff \| Splinter Review
Alternative approach - Part 3 - Integrating it into layer transactions (WIP) 9 years ago Nicolas Silva [:nical] 29.24 KB, patch		Details \| Diff \| Splinter Review

Nicolas Silva [:nical]

Assignee

Description

•

10 years ago

This keeps coming back to the discussion as we hit the file descriptor limit (often on Mac where the limit is rather low, but sometimes on other platforms too). So here is a bug to track that. The two most common use cases are tiles and video frames where it's easy to group allocations since a lot of textures have the same size. We could go further and have a "real" allocator for arbitrary sizes, but I think we would go most of the way with something simple that supports only allocations of a given size (one instance for tiling, and once instance per video stream).

Nicolas Silva [:nical]

Assignee

Comment 1

•

9 years ago

Attached patch Part 1 - A simple fixed size concurrent allocator (obsolete) — Details — Splinter Review

This contains the logic for a concurrent fixed-size and cross-process allocator working on a borrowed buffer. The idea is to wrap this into a class the provides the buffer (be it shared memory or regular memory) and manages the lifetime of the buffer (will come in a followup patch). This only has dependencies to mfbt, and I'd like to keep it that way (I am hesitating to put it in Moz2d instead of layers/ipc/ to senure that). I'll add tests for it in a subsequent patch.

Attachment #8713192 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Updated

•

9 years ago

Whiteboard: [gfx-noted]

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

9 years ago

Comment on attachment 8713192 [details] [diff] [review] Part 1 - A simple fixed size concurrent allocator Review of attachment 8713192 [details] [diff] [review]: ----------------------------------------------------------------- Using release asserts works as long as this wrapping class deals with making sure the calls to the FixedClassAllocator have good arguments...

Nicolas Silva [:nical]

Assignee

Comment 3

•

9 years ago

Attached patch Part 1 - A simple fixed size concurrent allocator (obsolete) — Details — Splinter Review

The previous patch was missing part of the initialization.

Attachment #8713192 - Attachment is obsolete: true

Attachment #8713192 - Flags: review?(jmuizelaar)

Attachment #8713243 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Comment 4

•

9 years ago

(In reply to Milan Sreckovic [:milan] from comment #2) > Using release asserts works as long as this wrapping class deals with making > sure the calls to the FixedClassAllocator have good arguments... Yes. The release asserts I added correspond to what would be use-after-free bugs with a regular allocator so I'm not taking any risks there. Since this will be on both the parent and child process I can change it to crash content processes and just gfxCriticalError on the parent process so that child processes can't crash the parent by sending invalid IDs (I'll propose that in a followup patch).

Nicolas Silva [:nical]

Assignee

Comment 5

•

9 years ago

Attached patch Part 1 - A simple fixed size concurrent allocator — Details — Splinter Review

Fix a bug and the busted indentation.

Attachment #8713243 - Attachment is obsolete: true

Attachment #8713243 - Flags: review?(jmuizelaar)

Attachment #8713735 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Comment 6

•

9 years ago

Attached patch Part 2 - Use a magic pattern to check that the allocator is not created around random memory (obsolete) — Details — Splinter Review

Attachment #8713736 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Comment 7

•

9 years ago

Attached patch Part 3 - gtests — Details — Splinter Review

Attachment #8713737 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Comment 8

•

9 years ago

Nicolas Silva [:nical]

Assignee

Comment 9

•

9 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8) > What was the rationale for doing a search instead of maintaining a free list? Because the search is dead simple and I expect this to be used with small amounts of chunks (like 5 or 10 per allocator for video streams) and 10 or 20 for tiles (or some other compromise between number of allocators vs size per allocator). It'll probably makes sense to change that into maintaining a free list whenever we want to have more chunks per allocator, but that will be more complicated (although not really scary either).

Jeff Muizelaar [:jrmuizel]

Comment 10

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #9) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #8) > > What was the rationale for doing a search instead of maintaining a free list? > > Because the search is dead simple and I expect this to be used with small > amounts of chunks (like 5 or 10 per allocator for video streams) and 10 or > 20 for tiles (or some other compromise between number of allocators vs size > per allocator). > > It'll probably makes sense to change that into maintaining a free list > whenever we want to have more chunks per allocator, but that will be more > complicated (although not really scary either). Can you add some documentation to the patch about this?

Nicolas Silva [:nical]

Assignee

Comment 11

•

9 years ago

Attached patch Part 4 - Cross process allocator — Details — Splinter Review

Memory reporting coming in a followup patch. Most of the complexity here is the IPDL book keeping and the need (for media) to be able to use the allocator concurrently from any thread (so there's some proxying à la TextureClient going on). Because of bug 1208226 (mapping shared memory can fail on the compositor side), I added two ways to query the state of the allocator: - Maybe<bool> TryIsValid() will quickly return the status if we know it. - bool CheckIsValid() will synchronously query the status of the other process if it needs to, so it's best to avoid using it if we can. It's not yet integrated with TextureClient (follwup patch in the making).

Attachment #8716313 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Comment 12

•

9 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #10) > Can you add some documentation to the patch about this? Added: /// We use a linear search rather than a free list because the implementation /// is very simple and we expect to be using these allocators with small /// amounts of chunks (about 10 to 20 chunks per allocator for media elements /// and tiles). If we ever need a more elaborate allocation scheme we'll have /// to add a separate allocator or modify this one.

Jeff Muizelaar [:jrmuizel]

Comment 13

•

9 years ago

I neglected this patch review last week. We've been looking at dealing with the file handle exhaustion problem directly with the patch that Lee has in bug 1245241.

Jeff Muizelaar [:jrmuizel]

Comment 14

•

9 years ago

Comment on attachment 8713735 [details] [diff] [review] Part 1 - A simple fixed size concurrent allocator Review of attachment 8713735 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/FixedSizeAllocator.cpp @@ +28,5 @@ > +struct ChunkDescriptor { > + // mGeneration can only be writen to by FixedSizeAllocator::Deallocate *before* > + // setting mState to false so that other threads can read the up to date generation > + // once they have acquired to chunk (my successfully setting mState to true). > + Atomic<uint32_t> mGeneration; I don't think mGeneration needs to be atomic. It's protected by mState. (i.e. all writes to it happen after exclusive access to the block has been obtained. Adding a comment about how mGeneration is used for detecting use-after-free scenarios would be nice.

Attachment #8713735 - Flags: review?(jmuizelaar) → review+

Jeff Muizelaar [:jrmuizel]

Comment 15

•

9 years ago

Comment on attachment 8713736 [details] [diff] [review] Part 2 - Use a magic pattern to check that the allocator is not created around random memory Review of attachment 8713736 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/FixedSizeAllocator.cpp @@ +16,5 @@ > // we use 8 words alignment. > > +static const char * const sFixedSizeAllocMagicPattern = > + " Shmurf from this simple shmurf allocator and shmurf it safely on another process "; > +#define MAGIC_PATTERN_SIZE 80 static const char[] sFixedSizeAllocMagicPattern = " Shmurf from this simple shmurf allocator and shmurf it safely on another process " You don't need a a pointer to this string just using the address of the string is fine. Then you can use something like: #define MAGIC_PATTERN_SIZE = sizeof(sFixedSizeAllocMagicPattern)

Attachment #8713736 - Flags: review?(jmuizelaar) → review-

Nicolas Silva [:nical]

Assignee

Comment 16

•

9 years ago

Attached patch Part 2 - Use a magic pattern to check that the allocator is not created around random memory — Details — Splinter Review

Review comments addressed.

Attachment #8713736 - Attachment is obsolete: true

Attachment #8722984 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Updated

•

9 years ago

Depends on: 1254591

Nicolas Silva [:nical]

Assignee

•

9 years ago

Attached patch Alternative approach - Part 3 - Integrating it into layer transactions (WIP) — Details — Splinter Review

Nicolas Silva [:nical]

Assignee

•

9 years ago

I am playing with different settings for the allocator to get a feel of the overhead we get when using it for all PCompositorBridge shmem allocations. I am setting up the allocator heap size to be big enough for 10 tiles, with a separate allocator for non-tiled layers. So far it's a clear win in terms of number of actual allocations which may have a noticeable performance impact (I haven't actually measured that), but the memory overhead is quite large after heavy scrolling. The current scheme works well while we need a lot of tiles, but once we get rid of the times because there is less scrolling/pating going on, we end up being bad a freeing shmem heaps, and the allocator which was mostly full during scrolling ends up struggling to get rid of its not-completely-empty heaps, looking something like: > [...#.##...] > [.....#####] > [.........#] > [#.########] > [##...#####] > [##########] > [#######.##] > [#####.####] > [##########] > [######....] Where # is an occupied slot and . is an available slot, and each line is a heap (1 shmem) in the allocator (each tile texture occupies exactly one slot). Unless there is some heavy scrolling again, tile allocations tend to remain stable and the holes stay. It'd be nice to be able to guess which tiles/textures are going to live longer than the others but the way tiling works makes it difficult. We'd probably have much better results if the tiling code would would not try so hard to reuse tiles, not only through the tile pool but also through the TiledLayerBuffer logic. So there's still some knobs to tweak there (disabling the tile pool is not enough). Reducing the heap size helps with the pathological cases, I am currently toying with that, and way to more intelligently use The allocator for non-tiled content is using far less memory (we just allocate a lot more tiles than the rest as far as I can see), so the memory overhead is less worrying there so far. I also tried to use one allocator for everything (tiled and non-tiled content) but that made things worse, although some more clever scheme could make a difference. I haven't tried to use the allocator for video frames yet but I expect things be easier because we can use one heap per stream with a size that corresponds to the amount of frames we buffer (10 I think). The heap should be mostly full during playback and quickly empty when the video is removed. Long story short, there's some tuning to do for this to be worth it.

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

8 years ago

Priority: -- → P3

Nicolas Silva [:nical]

Assignee

Comment 24

•

7 years ago

I don't think this is worth pursuing for now.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Attachment #8731353 - Flags: review?(jmuizelaar)

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Attachment #8731355 - Flags: review?(jmuizelaar)

You need to log in before you can comment on or make changes to this bug.