Closed Bug 565875 Opened 14 years ago Closed 14 years ago

Improve ImageLayerOGL performance by recycling buffers

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(4 files, 2 obsolete files)

I did some profiling of full-screen video playback on Mac (with GL) and found we were spending a lot of time in kernel page fault handling. Some of these faults were due to large memory allocations turning into mmaps. Some of those allocations are done in PlanarYCbCrImageOGL::SetData, and some are in the GL driver. We can fix the former by recycling buffers, and we can fix the latter by recycling textures. Patches ahead.
Some preliminary cleanup:
-- Use a single buffer for all planes, and manage it with an nsAutoArrayPtr so we don't have to explicitly clean up.
-- Make our mData have the right mPicX/mPicY (0) so it accurately reflects what we did
-- Make some functions non-virtual that don't need to be virtual
-- Separated copying of the Cb and Cr, this probably improves locality
Attachment #445327 - Flags: review?(bas.schouten)
Threading's a bit messed up here. We currently have a threading hazard if a CairoImageOGL is deleted off the main thread. We also have the suboptimal situation that we upload to textures every time we paint a PlanarYCbCrImageOGL. What we really need is the ability to destroy textures in the destructor, but have texture destruction postponed to run on the main thread if we're not already on the main thread. So, I do that, via TextureDeleter and DeleteTexturesOnMainThread.

To free the textures, we have to have a GLContext reference, but we have to be careful how we manage the GLContext reference here since we can't addref/release it off the main thread. So what I do is grab a reference to the GLContext on the main thread at an opportune time, and make DeleteTexturesOnMainThread responsible for releasing the reference on the main thread.

Texture deletion for PlanarYCbCrImageOGL moves to the destructor. We can then skip AllocateTextures if we've already set up textures for this image.

Also fixes a signed/unsigned warning in LayerManagerOGL.
Attachment #445328 - Flags: review?(bas.schouten)
Attached patch Part 3: Recycle buffers (obsolete) — Splinter Review
Add a stack of buffers to the ImageContainerOGL. All buffers have the same size (mRecycledBufferSize). We add a (thread-safe) API to store a buffer in the recycle list and try to get a buffer from the recycle list. Use that API in PlanarYCbCrImageOGL to save a buffer for recycling when we don't need it anymore, and to try to get a recycled buffer in SetData.
Attachment #445329 - Flags: review?(bas.schouten)
Comment on attachment 445328 [details] [diff] [review]
Part 2: fix threading issues in ImageLayersOGL

I've got better ideas for the thread safety stuff here.
Attachment #445328 - Flags: review?(bas.schouten)
(In reply to comment #3)
> Created an attachment (id=445329) [details]
> Part 3: Recycle buffers
> 
> Add a stack of buffers to the ImageContainerOGL. All buffers have the same size
> (mRecycledBufferSize). We add a (thread-safe) API to store a buffer in the
> recycle list and try to get a buffer from the recycle list. Use that API in
> PlanarYCbCrImageOGL to save a buffer for recycling when we don't need it
> anymore, and to try to get a recycled buffer in SetData.

Hrm, unless your heap manager sucks I don't think this should really matter a lot  in a build using the non-debug allocation functions (where we won't prevent memory initialization overhead by this approach).

Additionally we could also just allocate when not finding a matching buffer in TakeRecycledBuffer (maybe changing the name a bit, since it would not necessarily be recycled), preventing the mBuffer check after calling the function and cleaning the code a bit.
(In reply to comment #4)
> (From update of attachment 445328 [details] [diff] [review])
> I've got better ideas for the thread safety stuff here.

Okay, just to address my concerns with the previous patch.

I don't see any big issues, but a couple of things:

I don't see any advantage to making CairoImageOGL hold a strong reference to the manager. The manager should really be alive when SetData is called on CairoImageOGL. If this would not be the case with a weak reference we'd be in a dangerous situation anyway, since the Widget that owns the LayerManager must've been destroyed. And we'd now we destroying the LayerManager after its widget is gone. The LayerManager would then be holding a dangling pointer to its widget, which should still be 'alright' but certainly isn't a desirable situation. Personally I think we should guarantee SetData is called while the LayerManager is still alive and kicking, and then make it a weak reference.

Additionally we should be clear, for the future, that a GLContext may never rely on its widget being around for things like MakeCurrent and such(since it can now potentially be destroyed on the main thread after the widget is long gone).
(In reply to comment #1)
> Created an attachment (id=445327) [details]
> Part 1: Refactor PlanarYCbCrImageOGL
> 
> Some preliminary cleanup:
> -- Use a single buffer for all planes, and manage it with an nsAutoArrayPtr so
> we don't have to explicitly clean up.
> -- Make our mData have the right mPicX/mPicY (0) so it accurately reflects what
> we did
> -- Make some functions non-virtual that don't need to be virtual
> -- Separated copying of the Cb and Cr, this probably improves locality

This all looks good (although I didn't check if the picture rect stuff still works :)). However could we double check that the additional conditional branches we introduce aren't more expensive than any cache locality improvements we might have? This could very well actually be slower.
(In reply to comment #5)
> (In reply to comment #3)
> > Created an attachment (id=445329) [details] [details]
> > Part 3: Recycle buffers
> > 
> > Add a stack of buffers to the ImageContainerOGL. All buffers have the same size
> > (mRecycledBufferSize). We add a (thread-safe) API to store a buffer in the
> > recycle list and try to get a buffer from the recycle list. Use that API in
> > PlanarYCbCrImageOGL to save a buffer for recycling when we don't need it
> > anymore, and to try to get a recycled buffer in SetData.
> 
> Hrm, unless your heap manager sucks I don't think this should really matter a
> lot  in a build using the non-debug allocation functions (where we won't
> prevent memory initialization overhead by this approach).

It does matter a lot on Mac. These large allocations are being turned into mmap/munmap operations, then as we touch the pages they are committed, via kernel page faults. Those page faults are a large performance drain.

> Additionally we could also just allocate when not finding a matching buffer in
> TakeRecycledBuffer (maybe changing the name a bit, since it would not
> necessarily be recycled), preventing the mBuffer check after calling the
> function and cleaning the code a bit.

Yes, we could move the allocation in there, I'll do that.
(In reply to comment #6)
> I don't see any advantage to making CairoImageOGL hold a strong reference to
> the manager. The manager should really be alive when SetData is called on
> CairoImageOGL. If this would not be the case with a weak reference we'd be in a
> dangerous situation anyway, since the Widget that owns the LayerManager must've
> been destroyed. And we'd now we destroying the LayerManager after its widget is
> gone. The LayerManager would then be holding a dangling pointer to its widget,
> which should still be 'alright' but certainly isn't a desirable situation.

It should be able tolerate that since LayerManagers are refcounted. Someone might hold a reference to it until after the widget is gone.

> Personally I think we should guarantee SetData is called while the LayerManager
> is still alive and kicking, and then make it a weak reference.

That's slightly problematic for general Images since they can be created off the main thread. It's less of a problem for CairoImages.

I can probably get rid of the layer manager reference anyway.

> Additionally we should be clear, for the future, that a GLContext may never
> rely on its widget being around for things like MakeCurrent and such(since it
> can now potentially be destroyed on the main thread after the widget is long
> gone).

Yes. We have to handle this because of the threading issues: objects managed off the main thread might outlive the widget and then we still have to be able to get back to the main thread and free textures using the GLContext.

If that turns out to be impossible we could avoid it, by moving more of the texture management into GLContext so it can free up all resources when the widget goes away. But it would be more complicated.
(In reply to comment #7)
> This all looks good (although I didn't check if the picture rect stuff still
> works :)). However could we double check that the additional conditional
> branches we introduce aren't more expensive than any cache locality
> improvements we might have? This could very well actually be slower.

What extra conditional branches? You mean for the loop over the rows? There's only one per row there, and they'll predict taken.
(In reply to comment #10)
> (In reply to comment #7)
> > This all looks good (although I didn't check if the picture rect stuff still
> > works :)). However could we double check that the additional conditional
> > branches we introduce aren't more expensive than any cache locality
> > improvements we might have? This could very well actually be slower.
> 
> What extra conditional branches? You mean for the loop over the rows? There's
> only one per row there, and they'll predict taken.

Yeah, I guess the cost shouldn't be big, I'm wondering if it outweighs the cache benefit locality though. But then again, I might be all wrong!

(In reply to comment #9)
> (In reply to comment #6)
> Yes. We have to handle this because of the threading issues: objects managed
> off the main thread might outlive the widget and then we still have to be able
> to get back to the main thread and free textures using the GLContext.
> 
> If that turns out to be impossible we could avoid it, by moving more of the
> texture management into GLContext so it can free up all resources when the
> widget goes away. But it would be more complicated.

I think it's a good idea to not have the GLContext rely on the initialization widget being there. It would definitely keep things easier. Just something we should keep in mind I suppose before we get weird bugs.
I'm adding a GLTexture type that wraps a texture. It contains a GLContext ref so its destructor can clean up (delegating the cleanup to a main thread event if necessary). Using these GLTextures simplifies texture management quite a bit.

This does rely on GLContexts still working even if we keep a reference to them around for a long time. But we really do need that. Consider for example a document containing a <video> element that's playing on the screen, but then we destroy the document's presentation. The window might go away completely, but we should still be able to extract video frames from the element using canvas.drawImage. This may require access to GL resources from when the element was visible on the screen.
Attachment #445328 - Attachment is obsolete: true
Attachment #445536 - Flags: review?(bas.schouten)
Similar to the previous patch, but simpler because of the new GLTexture wrapper. Also to avoid reference cycles I move the recycling to a new object, a RecycleBin.
Attachment #445329 - Attachment is obsolete: true
Attachment #445537 - Flags: review?(bas.schouten)
Pretty straightforward.
Attachment #445538 - Flags: review?(bas.schouten)
With these patches, fullscreen video is very efficient. The only remaining efficiency win in terms of getting pixels from the decoder to the screen would be to do glTexSubImage2D in SetData instead of copying into an intermediate buffer. But that will require some form of multithreaded GL. I thought hard about introducing CGLLockContext, but actually it probably wouldn't work too well since the main thread spends a ton of time waiting in glFinish, which would have to be holding the context lock, so the decoder thread would be blocked a lot on the main thread, losing the parallelism between decoding and displaying that we currently have.

We'd really need a separate context so we can upload textures in parallel with rendering. I don't know how well that would work either. Something to experiment with in the future.
Whiteboard: [needs review]
Doing a context per thread with a dummy non-shown window that shares GL resources with some global context would be the way to go, if it works.  That solves your texture freeing problem too, but it means that we have to be very careful about allocating/deallocating resources correctly since they won't be destroyed when the context is destroyed (well, until we destroy all contexts).
(In reply to comment #16)
> Doing a context per thread with a dummy non-shown window that shares GL
> resources with some global context would be the way to go, if it works.  That
> solves your texture freeing problem too, but it means that we have to be very
> careful about allocating/deallocating resources correctly since they won't be
> destroyed when the context is destroyed (well, until we destroy all contexts).

How well-tested would this be on various OGL drivers and such?
Comment on attachment 445327 [details] [diff] [review]
Part 1: Refactor PlanarYCbCrImageOGL

> class THEBES_API PlanarYCbCrImageOGL : public PlanarYCbCrImage
> {
> public:
>   PlanarYCbCrImageOGL(LayerManagerOGL *aManager);
>-  virtual ~PlanarYCbCrImageOGL();
> 
>-  virtual void SetData(const Data &aData);
>+  void SetData(const Data &aData);

Typo? SetData() is still virtual so you should probably keep the decorator.
Comment on attachment 445327 [details] [diff] [review]
Part 1: Refactor PlanarYCbCrImageOGL

I've ported these changes to the d3d9 backend. They look good to me.
Comment on attachment 445327 [details] [diff] [review]
Part 1: Refactor PlanarYCbCrImageOGL

As Jeff said we should probably keep the virtual decorator on SetData.
Attachment #445327 - Flags: review?(bas.schouten) → review+
You're completely right, yes.
Comment on attachment 445536 [details] [diff] [review]
Part 2: fix threading

Looks fine; the longevity of contexts might be a problem, but we can figure that out when/if it comes up.
Attachment #445536 - Flags: review?(bas.schouten) → review+
Attachment #445537 - Flags: review?(bas.schouten) → review+
Attachment #445538 - Flags: review?(bas.schouten) → review+
Whiteboard: [needs review] → [needs landing]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: