Last Comment Bug 776957 - OMTC: Remove unnecessary frame copies with canvas and layers acceleration
: OMTC: Remove unnecessary frame copies with canvas and layers acceleration
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: Nicolas Silva [:nical]
:
:
Mentors:
Depends on: 775226
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-24 09:24 PDT by Nicolas Silva [:nical]
Modified: 2012-08-11 19:57 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available. (1.38 KB, patch)
2012-08-01 11:15 PDT, Nicolas Silva [:nical]
cjones.bugs: review-
Details | Diff | Splinter Review
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available. (1.66 KB, patch)
2012-08-09 07:46 PDT, Nicolas Silva [:nical]
cjones.bugs: review-
Details | Diff | Splinter Review
Fixes (1.40 KB, patch)
2012-08-10 01:37 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
no flags Details | Diff | Splinter Review
Revert UA changes (4.13 KB, patch)
2012-08-10 02:25 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
no flags Details | Diff | Splinter Review

Description Nicolas Silva [:nical] 2012-07-24 09:24:29 PDT
When we draw into a canvas, we draw into "normal" memory (as opposed to shared), then during a layer transaction we copy the frame into shared memory and do a swap operation synchronously. on the compositor side we upload the image to a gl texture.

This involves many image buffers and many copies, so we need to reduce the memory and cpu overhead here.

One limitation that we have is that our drawing APIs don't separate the image data from the rendering state. This means that it is very hard for us to use a DrawTarget and swap its image data while keeping drawing states.

So our solution will most likely need to involve having the canvas context always render in the same buffer.

If we use Layers acceleration, we have to upload the canvas's image to gpu memory on the compositor side, so we don't actually need to keep the CanvasSurface (which is a shared image) around after we have done the upload.
We can have the canvas context render into shmem, then during a synchronous transaction we do the upload right away on the compositor side so that on the content side we can still use the same shared buffer in our DrawTarget.

The best would maybe to be able to do upload on the content side and share the texture between content and compositor processes.
Comment 1 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-25 15:54:04 PDT
(In reply to Nicolas Silva [:nical] from comment #0)
> One limitation that we have is that our drawing APIs don't separate the
> image data from the rendering state. This means that it is very hard for us
> to use a DrawTarget and swap its image data while keeping drawing states.

An Azure DrawTarget has very little drawing state, just the transform and clip, so we could implement buffer swapping pretty easily (either in nsCanvasRenderingContext2DAzure, or as a new Azure API) --- if we have to.

If we swap buffers then we still need to copy at least the changed area of the canvas buffer between front and back (and in most performance-intensive demos the changed area is basically the whole canvas), so there's still often going to be one full memcpy of the buffer every frame.

> If we use Layers acceleration, we have to upload the canvas's image to gpu
> memory on the compositor side, so we don't actually need to keep the
> CanvasSurface (which is a shared image) around after we have done the upload.
> We can have the canvas context render into shmem, then during a synchronous
> transaction we do the upload right away on the compositor side so that on
> the content side we can still use the same shared buffer in our DrawTarget.

That seems like a reasonable approach.

For the case where the canvas is not changing (which there's actually a fair bit of on the Web), and something like gralloc is available, we could potentially use less memory by having a single gralloc buffer shared by the canvas context and the compositor.

So it might be more efficient to have a model where once the canvas buffer has been passed to the compositor, it's treated as read-only by the content side until the content needs to modify the canvas again, at which point we copy the compositor's version of the canvas into a new buffer and use that. This model potentially allows for asynchronous transactions, although there would need to be some buffer recycling so reallocating the buffer for a new frame is cheap.

Either that model or the sync-transaction-with-upload you suggested would be pretty good, so I don't have a strong opinion on which one you implement.
Comment 2 Nicolas Silva [:nical] 2012-07-26 10:29:28 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #1)
> (In reply to Nicolas Silva [:nical] from comment #0)
> > One limitation that we have is that our drawing APIs don't separate the
> > image data from the rendering state. This means that it is very hard for us
> > to use a DrawTarget and swap its image data while keeping drawing states.
> 
> An Azure DrawTarget has very little drawing state, just the transform and
> clip, so we could implement buffer swapping pretty easily (either in
> nsCanvasRenderingContext2DAzure, or as a new Azure API) --- if we have to.

The hard part is that if we want to do this, we need to manualy keep track of every clip and transformation, because some of the backends (CoreGraphcis for instance) don't provide a way to get the clips. So Copying the state from a target to another is not easy and some backends don't provide a way to swap the image buffer while keeping the drawing states.
Comment 3 Nicolas Silva [:nical] 2012-07-26 10:32:17 PDT
I'm confused, I didn't mean to change the component to color management
Comment 4 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-26 16:26:00 PDT
(In reply to Nicolas Silva [:nical] from comment #2)
> The hard part is that if we want to do this, we need to manualy keep track
> of every clip and transformation, because some of the backends (CoreGraphcis
> for instance) don't provide a way to get the clips. So Copying the state
> from a target to another is not easy and some backends don't provide a way
> to swap the image buffer while keeping the drawing states.

nsCanvasRenderingContext2DAzure already tracks all the clips, so we could support buffer-swapping at that level.
Comment 5 Nicolas Silva [:nical] 2012-08-01 11:15:41 PDT
Created attachment 648018 [details] [diff] [review]
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available.

This is a simple enhancement that makes us take advantage of gralloc buffers or other types of "direct texturing" memory when they are used with canvas and gl layers.
There are still canvas frame copies that we want to remove.
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-01 18:33:50 PDT
Comment on attachment 648018 [details] [diff] [review]
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available.

>diff --git a/gfx/layers/opengl/CanvasLayerOGL.cpp b/gfx/layers/opengl/CanvasLayerOGL.cpp

>+  if (nsRefPtr<TextureImage> texImage =
>+      ShadowLayerManager::OpenDescriptorForDirectTexturing(
>+        gl(), aNewFront.get_SurfaceDescriptor(), LOCAL_GL_CLAMP_TO_EDGE)) {
>+
>+    mTexImage = texImage;
>+
>+    *aNewBack = aNewFront;
>+

Following along from ShadowThebesLayerOGL, inside here we want to
 - destroy the current front buffer if |aNewFront| is different
   size/content-type
 - save a reference to |aNewFront|
 - return the old front buffer in |*aNewBack|

r- because this breaks the concurrency model between content /
compositor for the gralloc buffers.
Comment 7 Nicolas Silva [:nical] 2012-08-09 07:46:41 PDT
Created attachment 650555 [details] [diff] [review]
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available.

try push looks good https://tbpl.mozilla.org/?tree=Try&rev=f1b67dde45e1
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 00:57:01 PDT
Comment on attachment 650555 [details] [diff] [review]
Use OpenDescriptorForDirectTexturing with CanvasLayerOGL when available.

Crashes on device

I/Gecko   (  363): [Parent 363] ###!!! ABORT: unknown union type: file /home/cjones/mozilla/new-b2g/objdir-gecko/ipc/ipdl/PLayersParent.cpp, line 771
F/libc    (  363): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 01:37:51 PDT
Created attachment 650825 [details] [diff] [review]
Fixes
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 01:56:06 PDT
With this patch, we go from ~35fps on TowerJelly to ~50fps, but the fps are a bit of a lie in both cases.  Too bad the user agent string changed and we're not getting honest content anymore (16x more pixels than screen size).  Will look into backing out UA change and see what real improvement is.
Comment 11 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 02:25:34 PDT
Created attachment 650834 [details] [diff] [review]
Revert UA changes

Surprisingly nontrivial ...
Comment 12 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 02:30:39 PDT
Definite framerate improvement, but not much qualitative improvement.  Measured using the wonky old fps counter, so I might have been seeing a lot of GCs that went undetected.
Comment 13 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-08-10 02:34:01 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/92ca5b751abf
Comment 14 Ryan VanderMeulen [:RyanVM] 2012-08-11 19:57:18 PDT
https://hg.mozilla.org/mozilla-central/rev/92ca5b751abf

Note You need to log in before you can comment on or make changes to this bug.