Last Comment Bug 649525 - WebGL layer compositing through the BasicCanvasLayer is very slow (desktop version)
: WebGL layer compositing through the BasicCanvasLayer is very slow (desktop ve...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: unspecified
: Other All
: -- normal (vote)
: mozilla8
Assigned To: Scott Ruff
:
Mentors:
http://webglsamples.googlecode.com/hg...
Depends on:
Blocks: 811115
  Show dependency treegraph
 
Reported: 2011-04-12 16:39 PDT by Scott Ruff
Modified: 2012-11-13 12:43 PST (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch for Firefox 4 release source (18.53 KB, patch)
2011-04-28 16:20 PDT, Scott Ruff
no flags Details | Diff | Splinter Review
Patch for top of tree (21.01 KB, patch)
2011-04-28 16:20 PDT, Scott Ruff
no flags Details | Diff | Splinter Review
Updated Firefox 4 patch (24.00 KB, patch)
2011-05-16 09:19 PDT, Scott Ruff
no flags Details | Diff | Splinter Review
Update to previous patch (26.08 KB, patch)
2011-05-19 10:43 PDT, Scott Ruff
no flags Details | Diff | Splinter Review
Remove custom XRender function made changes to force the fast path through Cairo (16.51 KB, patch)
2011-05-24 15:10 PDT, Scott Ruff
jmuizelaar: review-
Details | Diff | Splinter Review
Another update based on comments (14.32 KB, patch)
2011-06-13 11:57 PDT, Scott Ruff
jmuizelaar: review+
Details | Diff | Splinter Review
Patch merged with top of tree (13.91 KB, patch)
2011-07-08 06:58 PDT, Scott Ruff
jmuizelaar: review+
Details | Diff | Splinter Review

Description Scott Ruff 2011-04-12 16:39:27 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
Build Identifier: 4.0

I am attempting to optimize the desktop version of firefox to run WebGL well on a mobile device running linux, X11 and EGL.

WebGL Compositing through the BasicCanvasLayer is very slow due to a read pixels call, and the final Fill via cairo, which appears to not be accelerated on X11. On some mobile devices I have seen combined times of over 250 ms per frame.

Experimental code which returns an offscreen pixmap render context from the EGL context provider, and makes this surface directly available to BasicCanvasLayer::Update can eliminate the ReadPixels.

Replacing the cairo rectangle/fill calls with accelerated calls to XRenderComposite lower the final composite step to a couple of ms.

The above can result in as much as 10x frame rate improvement in some cases.

Why not just use the HW layers path you might ask? On our mobile platform XRender is well accelerated and stable. My initial attempts to turn on HW layers using an optimized EGL path that rendered directly to textures was not stable, and was not faster. I understand that the HW layers path is the long term architecture, and will likely work to optimize that path for our device at a later time. Optimizing the xrender path will allow us to enable webgl on shipping devices sooner.


Reproducible: Always

Steps to Reproduce:
1. I am using WebGL aquarium as my WebGL test app.http://webglsamples.googlecode.com/hg/aquarium/aquarium.html
2. Configure app for 50 fish, and turn off all the options such as bubbles, light rays, normals, but leave on the tank. 
Actual Results:  
Full screen on our device, the frame rate is only 2 fps.

Expected Results:  
It can go faster. With experimental code, this test case can run at 9 fps.

I am planning to assign this bug to myself.
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2011-04-12 17:23:50 PDT
Currently, WebGL contexts render to a FBO whose color attachment is a texture. (See GLContext::ResizeOffscreenFBO in gfx/thebes/GLContext.cpp).

Do you mean that you have a way of directly using OpenGL textures as pixmaps? Or do you plan to change the way that WebGL contexts work to no longer render to such FBOs and instead be backed by pixmaps? In the latter case, what is your plan for WebGL canvas resizing? Using a FBO is very convenient as that means that when the canvas gets resized, we only have to recreate the FBO.
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2011-04-12 17:55:07 PDT
Oh, OK, here's how you could solve that anyway: first create your pixmap, then use texture_from_pixmap to interprete it as a texture and use that to back the WebGL context's FBO.
Comment 3 Scott Ruff 2011-04-13 08:11:53 PDT
To be clear I should state that I am optimizing a unique path. The path is desktop firefox, on linux, on a mobile device with EGL support only, and running the non GL layer compositing path.

The current EGL code looks to be written mostly for Android and Nokia. I'm not sure if anyone else has tried to use EGL with desktop.

So on to your questions.

What I have done in my experimental code is modify the EGL context provider to create a pixmap, and then call eglCreatePixmapSurface with that pixmap. This creates an offscreen pixmap surface, which is used as the render target of the context. In this case there is no FBO and no texture. Since I'm optimizing the non GL layers path, I don't need a texture and I can use the offscreen surface directly to a call to XRenderComposite to perform the final composite.

As for the resize, I don't think I'm handling that, so I'll need to do that.

I'm trying to figure out how to integrate these optimizations in a clean way without breaking anything else. I expect I might want to add a build option. I'm still working on that, but hope to post at least a first pass patch this week.
Comment 4 Benoit Jacob [:bjacob] (mostly away) 2011-04-14 02:05:19 PDT
(In reply to comment #3)
> To be clear I should state that I am optimizing a unique path. The path is
> desktop firefox, on linux, on a mobile device with EGL support only, and
> running the non GL layer compositing path.
> 
> The current EGL code looks to be written mostly for Android and Nokia. I'm not
> sure if anyone else has tried to use EGL with desktop.

Sure: on Windows, we use ANGLE, which provides a GLESv2/EGL implementation on top of D3D9. In this way, EGL is being used on the desktop. In the future we plan to use it also on Mac/Linux but that's not happening at the moment.

> 
> So on to your questions.
> 
> What I have done in my experimental code is modify the EGL context provider to
> create a pixmap, and then call eglCreatePixmapSurface with that pixmap. This
> creates an offscreen pixmap surface, which is used as the render target of the
> context. In this case there is no FBO and no texture. Since I'm optimizing the
> non GL layers path, I don't need a texture and I can use the offscreen surface
> directly to a call to XRenderComposite to perform the final composite.
> 
> As for the resize, I don't think I'm handling that, so I'll need to do that.

OK, so, that makes sense, and you indeed still have the resizing problem to solve. See my suggestion in comment 2.

> 
> I'm trying to figure out how to integrate these optimizations in a clean way
> without breaking anything else.

https://developer.mozilla.org/en/mozilla_automated_testing
Make sure to do 'make reftest' to check for regressions around Layers code, and you can also check for WebGL regressions using the WebGL mochitest:
TEST_PATH=content/canvas/test/webgl/test_webgl_conformance_test_suite.html make mochitest-plain

> I expect I might want to add a build option.

Is this planned to land on mozilla-central?
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2011-04-14 02:11:34 PDT
(In reply to comment #0)
> Why not just use the HW layers path you might ask? On our mobile platform
> XRender is well accelerated and stable. My initial attempts to turn on HW
> layers using an optimized EGL path that rendered directly to textures was not
> stable, and was not faster.
> I understand that the HW layers path is the long
> term architecture, and will likely work to optimize that path for our
> device at a later time. Optimizing the xrender path will allow us to enable
> webgl on shipping devices sooner.

Until very recently, GL layers on X were slow because we had to read back pixmaps to memory before uploading them as textures.

That just got fixed: bug 640082. So it might be worth re-evaluating GL layers now. Our current plan is to enable them by default on X11 in Firefox 6 (release around August/September).
Comment 6 Scott Ruff 2011-04-14 15:51:37 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > To be clear I should state that I am optimizing a unique path. The path is
> > desktop firefox, on linux, on a mobile device with EGL support only, and
> > running the non GL layer compositing path.
> > 
> > The current EGL code looks to be written mostly for Android and Nokia. I'm not
> > sure if anyone else has tried to use EGL with desktop.
> 
> Sure: on Windows, we use ANGLE, which provides a GLESv2/EGL implementation on
> top of D3D9. In this way, EGL is being used on the desktop. In the future we
> plan to use it also on Mac/Linux but that's not happening at the moment.
> 
> > 
> > So on to your questions.
> > 
> > What I have done in my experimental code is modify the EGL context provider to
> > create a pixmap, and then call eglCreatePixmapSurface with that pixmap. This
> > creates an offscreen pixmap surface, which is used as the render target of the
> > context. In this case there is no FBO and no texture. Since I'm optimizing the
> > non GL layers path, I don't need a texture and I can use the offscreen surface
> > directly to a call to XRenderComposite to perform the final composite.
> > 
> > As for the resize, I don't think I'm handling that, so I'll need to do that.
> 
> OK, so, that makes sense, and you indeed still have the resizing problem to
> solve. See my suggestion in comment 2.
> 
> > 
> > I'm trying to figure out how to integrate these optimizations in a clean way
> > without breaking anything else.
> 
> https://developer.mozilla.org/en/mozilla_automated_testing
> Make sure to do 'make reftest' to check for regressions around Layers code, and
> you can also check for WebGL regressions using the WebGL mochitest:
> TEST_PATH=content/canvas/test/webgl/test_webgl_conformance_test_suite.html make
> mochitest-plain

Thanks for the pointers

> 
> > I expect I might want to add a build option.
> 
> Is this planned to land on mozilla-central?

I'm not 100% sure what you mean. JP Rosevear was encouraging us to go through the Mozilla process, and try to get our changes in the main code line. It is possible that you or Jeff could decide that it's too special case and shouldn't go in.
Comment 7 Scott Ruff 2011-04-14 16:12:52 PDT
(In reply to comment #5)
> (In reply to comment #0)
> > Why not just use the HW layers path you might ask? On our mobile platform
> > XRender is well accelerated and stable. My initial attempts to turn on HW
> > layers using an optimized EGL path that rendered directly to textures was not
> > stable, and was not faster.
> > I understand that the HW layers path is the long
> > term architecture, and will likely work to optimize that path for our
> > device at a later time. Optimizing the xrender path will allow us to enable
> > webgl on shipping devices sooner.
> 
> Until very recently, GL layers on X were slow because we had to read back
> pixmaps to memory before uploading them as textures.
> 
> That just got fixed: bug 640082. So it might be worth re-evaluating GL layers
> now. Our current plan is to enable them by default on X11 in Firefox 6 (release
> around August/September).

I am familiar with that bug. My first approach was to modify the EGL context provider slightly to use the extensions our hardware supports, and avoid any copies to texture. Just to see if I could speed up general (Not Webgl) frame rates. I was able to get it running. I ran into a couple problems. First, with a backing surface larger than the dirty rect, areas outside the dirty rect were being cleared. I added a clip call and that fixed it. There was also a crash in rendering that I commented out but didn't root cause. The main problem was that the performance was not improved. It looked like the drawing of the scroll bars and other UI was taking a pure software path, It seemed that the painting of these areas need to paint to a colormap/palletized surface, but I'm providing an RGB texture backed surface, so they take a software path and then copy. It is also possible that our XRenderComposite acceleration is faster than our general texture path, but I don't have any data either way.

I really want a solution in my pocket ASAP for a June timeframe, and then I can go back to working on the HW layer path.

Again, if you would prefer that this be kept out of the main line we can discuss. Obviously it would help if I posted some code. I'm figuring out how to modify the EGL code cleanly.
Comment 8 Benoit Jacob [:bjacob] (mostly away) 2011-04-15 11:21:32 PDT
(In reply to comment #7)
> Again, if you would prefer that this be kept out of the main line we can
> discuss. Obviously it would help if I posted some code. I'm figuring out how to
> modify the EGL code cleanly.

I had a chat with Jeff and we agreed that by default, we would like that to land on mozilla-central (aka mainline), but of course we need to see the patch before we can give you a 100% guarantee.
Comment 9 Scott Ruff 2011-04-28 16:20:02 PDT
Created attachment 528985 [details] [diff] [review]
Patch for Firefox 4 release source

I'm finally posting my patches for this bug. I am posting two versions, one for firefox 4 release code, and one for mozilla central.

The patch results in 3x - 25x frame rate improvement on a tegra, Harmony like device, depending on texture usage and complexity.

The patch adds a --enable-egl-xrender-composite build flag which obviously is only useful for builds that use EGL on X.

I have tried to make this change as clean as possible, However, I realize you might want me to put the xrender code somewhere else. Hopefully I'm close. 

As for testing I attempted to test x86 linux (without my option enabled) to make sure I didn't break anything. The reftest ran with the same results with and without my patch. the webgl mochitest did not run with or without my changes. The first 280ish tests ran but then all the rest started timing out. There is an error stating that proxy server is refusing connection. I posted a question to #QA but didn't get a response.

As for testing on a tegra device that seems even more complicated. I cross compile so all the paths in the makefiles are wrong. I wrote a script to replace them. Also there seem to be a lot of links to the host path as well. I have not got any test running on the device yet but I'll probably put more effort into it if I can get them running on x86 linux.

I'd like to have the interested parties review the changes, and revisit tests once we think the code is close.
Comment 10 Scott Ruff 2011-04-28 16:20:52 PDT
Created attachment 528987 [details] [diff] [review]
Patch for top of tree
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-04-29 04:59:41 PDT
Thanks for posting the patch. Assigning review to Jeff ensures they eventually get reviewed (he's away until May 3). Myself I'm not competent to review Layers code.

I suppose that the only patch we'll take is the one for mozilla-central. We only take trivial bug fixes for Firefox 4.

> the webgl mochitest did not run with or without my
> changes. The first 280ish tests ran but then all the
> rest started timing out.

I got this error myself last week; will revert to you when I find back what the solution was.

> As for testing on a tegra device that seems even more complicated.
> I cross compile so all the paths in the makefiles are wrong.

I've been worried about that too and haven't yet run tests on mobile. I think that the #mobile channel on irc.mozilla.org should be able to help you with that.
Comment 12 Jeff Muizelaar [:jrmuizel] 2011-05-05 13:53:52 PDT
I'll try to get to this soon. I'm still catching up from being on vacation.
Comment 13 Scott Ruff 2011-05-10 09:40:11 PDT
Jeff, I'm planning to submit an updated patch that fixes my failure to respect the current clip rect in BasicCanvasLayer::PaintWith Opacity, and a few other changes. If you haven't already started reviewing, please wait for the update.
Comment 14 Scott Ruff 2011-05-16 09:19:46 PDT
Created attachment 532670 [details] [diff] [review]
Updated Firefox 4 patch

I have submitted an updated patch for Firefox 4 that fixes some bugs in the previous patch. This patch also uses xrender to speed 2D canvas drawImage calls in some cases. I can create a separate bug for that if desired, but they are very related.

I'm submitting just a firefox 4 patch since that is what we plan to deploy in the near term. I'd like to iterate on this patch, and then if Mozilla wants to take the patch into TOT, do one final merge.
Comment 15 Scott Ruff 2011-05-19 10:43:12 PDT
Created attachment 533706 [details] [diff] [review]
Update to previous patch

Take fast composite path when premultiplied alpha is false but opaque is true.
Fixed a drawImage bug.
Comment 16 Jeff Muizelaar [:jrmuizel] 2011-05-20 06:52:48 PDT
Comment on attachment 533706 [details] [diff] [review]
Update to previous patch

Scott, I was looking this over and just realized something. Is there any reason we need to composite using XRender directly instead of just using the drawable through cairo (cairo_xlib_surface_create)? If so that should avoid a bunch of the manual work that CompositeWithXRender does and also avoid having to expose cairo_has_clip.
Comment 17 Benoit Jacob [:bjacob] (mostly away) 2011-05-20 10:06:37 PDT
(In reply to comment #9)
> As for testing I attempted to test x86 linux (without my option enabled) to
> make sure I didn't break anything. The reftest ran with the same results
> with and without my patch. the webgl mochitest did not run with or without
> my changes. The first 280ish tests ran but then all the rest started timing
> out. There is an error stating that proxy server is refusing connection. I
> posted a question to #QA but didn't get a response.

We just recently figured what the problem was, see bug 658359. It has a patch which you can apply, anyway I'll land it soon. Basically the .frag/.vert file types used in this mochitest were unknown so we were iterating over plugins to see if any would handle them, in my case eventually crashing the http server process in the VLC/Totem video plugin.
Comment 18 Scott Ruff 2011-05-20 15:42:52 PDT
Jeff I agree it would be much preferred for this all to happen in Cairo.
I need to figure out why fill doesn't take a fast xrender path.
Comment 19 Scott Ruff 2011-05-24 15:10:49 PDT
Created attachment 534904 [details] [diff] [review]
Remove custom XRender function made changes to force the fast path through Cairo

I have made two changes to make Cairo take a fast XRender path which indeed makes things much cleaner, assuming the changes are acceptable.

First in BasicCanvasLayer::PaitWithOpacity, the surface extend flag was set to PAD which is not accelerated by XRender. I don't see an obvious reason for it to be set to PAD so I have changed it to NONE. This alone allows the final composite to be accelerated through Cairo.

The fill used by drawImage can fail to take the fast XRender path when the destination x, y, width, height are not integers. For example it fails for the GuiMark2 bitmap test. Conditional on my build option, I round x, y, width, height to the nearest integer in Cairo _composite_rectangle so that it will take the fast path. I hope this acceptable because it produces a 20x fps improvement on the above benchmark.
Comment 20 Siarhei Siamashka 2011-05-26 04:23:52 PDT
(In reply to comment #19)
> First in BasicCanvasLayer::PaitWithOpacity, the surface extend flag was set
> to PAD which is not accelerated by XRender. I don't see an obvious reason
> for it to be set to PAD so I have changed it to NONE. This alone allows the
> final composite to be accelerated through Cairo.

Do you mean PAD is not properly accelerated on your hardware with your graphics driver?

This sounds bad, just because according to radeon drivers devopers [1][2] and nouveau [3], NONE is supposed to be quite difficult to accelerate in hardware and PAD was recommended instead. Correct NONE implementation also happens to be a bit difficult for software rendering, and certain types of NONE repeat are not optimized in pixman at the moment.

So basically everyone else seems to prefer PAD. Is anybody looking into fixing this PAD acceleration problem in the graphics drivers of your mobile device?


1. https://bugs.freedesktop.org/show_bug.cgi?id=27954#c4
2. https://bugzilla.mozilla.org/show_bug.cgi?id=581797
3. http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nv50_exa.c#n558
Comment 21 Scott Ruff 2011-05-27 10:23:40 PDT
Thank you for that info. Given that, I'm not comfortable with that change either.

Our drivers support XRender 0.10 wich supports pad, but the X version is 10600000 and cairo marks the pad buggy flag if version less that 10699000. If I comment out the check it runs fine.

I'll put PAd back, and then I'm wondering if the best way to deal with this is just make a two line patch to disable this check in customer builds for our device. The alternative is #ifdef and add more complexity to an area of code that is already complicated.
Comment 22 Jeff Muizelaar [:jrmuizel] 2011-06-03 13:14:14 PDT
(In reply to comment #21)
> Thank you for that info. Given that, I'm not comfortable with that change
> either.
> 
> Our drivers support XRender 0.10 wich supports pad, but the X version is
> 10600000 and cairo marks the pad buggy flag if version less that 10699000.
> If I comment out the check it runs fine.
> 
> I'll put PAd back, and then I'm wondering if the best way to deal with this
> is just make a two line patch to disable this check in customer builds for
> our device. The alternative is #ifdef and add more complexity to an area of
> code that is already complicated.

I think it might be worth adding #ifdefs around the check in cairo as documentation of the problem. The patch then becomes even simpler.
Comment 23 Jeff Muizelaar [:jrmuizel] 2011-06-03 13:30:03 PDT
Comment on attachment 534904 [details] [diff] [review]
Remove custom XRender function made changes to force the fast path through Cairo


>     nsRefPtr<gfxImageSurface> isurf =
>       new gfxImageSurface(gfxIntSize(mBounds.width, mBounds.height),
>                           (GetContentFlags() & CONTENT_OPAQUE)
>@@ -949,14 +965,6 @@
>     }
> 
>     NS_ASSERTION(isurf->Stride() == mBounds.width * 4, "gfxImageSurface stride isn't what we expect!");
>-
>-    // We need to read from the GLContext
>-    mGLContext->MakeCurrent();
>-
>-    // We have to flush to ensure that any buffered GL operations are
>-    // in the framebuffer before we read.
>-    mGLContext->fFlush();
>-

Why are these going away?

>     PRUint32 currentFramebuffer = 0;
> 
>     mGLContext->fGetIntegerv(LOCAL_GL_FRAMEBUFFER_BINDING, (GLint*)&currentFramebuffer);
>@@ -987,10 +995,12 @@
>     // stick our surface into mSurface, so that the Paint() path is the same
>     mSurface = isurf;
>   }
>+  }
> 
>   // sanity
>   NS_ASSERTION(mUpdatedRect.IsEmpty() || mBounds.Contains(mUpdatedRect),
>                "CanvasLayer: Updated rect bigger than bounds!");
>+                             
> }
> 
> void
>@@ -1009,7 +1019,9 @@
>   nsRefPtr<gfxPattern> pat = new gfxPattern(mSurface);
> 
>   pat->SetFilter(mFilter);
>-  pat->SetExtend(gfxPattern::EXTEND_PAD);
>+    
>+  // Note: EXTEND_NONE can be accelerated by XRender
>+  pat->SetExtend(gfxPattern::EXTEND_NONE);   

As discussed, this will need to go.

> 
>   gfxMatrix m;
>   if (mNeedsYFlip) {
>@@ -1018,12 +1030,35 @@
>     aContext->Scale(1.0, -1.0);
>   }
> 
>+  // If content opaque, then save off current operator and set to source.
>+  // This ensures that alpha is not applied even if the source surface
>+  // has an alpha channel
>+  gfxContext::GraphicsOperator savedOp;
>+  if (GetContentFlags() & CONTENT_OPAQUE) {
>+    savedOp = aContext->CurrentOperator();
>+    aContext->SetOperator(gfxContext::OPERATOR_SOURCE);    
>+  }

Is this needed for correctness or performance? I don't think we need it on other platforms.
 
Overall, there a bunch of #ifdef's that would be nice to avoid, but I'm not sure how practical that will be. Also, there a bunch of little cosmetic changes, that should be avoided or a separate patch.

>--- ff_clean/mozilla-2.0/gfx/cairo/cairo/src/cairo-surface-fallback.c	2011-03-18 17:33:40.000000000 -0600
>+++ ff_clean/mozilla-2.0_applied6-1/gfx/cairo/cairo/src/cairo-surface-fallback.c	2011-05-24 14:45:33.332262767 -0600
>@@ -723,6 +723,15 @@
>     if (traps->num_traps > 1 || ! traps->is_rectilinear || ! traps->maybe_region)
> 	return CAIRO_INT_STATUS_UNSUPPORTED;
> 
>+#if defined (MOZ_X11) && defined (MOZ_EGL_XRENDER_COMPOSITE)
>+  // Round parameters to integers so that we can take the fast path. This can provide
>+  // a 23X improvement over the software path for mobile XRender devices 
>+  traps->traps[0].top         = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].top) + .5));
>+	traps->traps[0].bottom      = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].bottom) + .5));
>+	traps->traps[0].left.p1.x   = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].left.p1.x) + .5));
>+	traps->traps[0].right.p1.x  = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].right.p1.x) + .5));
>+#endif
>+

I don't think this is a legitimate thing to do. If we need rounded parameters, we should be doing it at higher level.
Also, _cairo_fixed_round is a better choice.
Comment 24 Scott Ruff 2011-06-03 16:53:38 PDT
(In reply to comment #23)
> Comment on attachment 534904 [details] [diff] [review] [review]
> Remove custom XRender function made changes to force the fast path through
> Cairo
> 
> 
> >     nsRefPtr<gfxImageSurface> isurf =
> >       new gfxImageSurface(gfxIntSize(mBounds.width, mBounds.height),
> >                           (GetContentFlags() & CONTENT_OPAQUE)
> >@@ -949,14 +965,6 @@
> >     }
> > 
> >     NS_ASSERTION(isurf->Stride() == mBounds.width * 4, "gfxImageSurface stride isn't what we expect!");
> >-
> >-    // We need to read from the GLContext
> >-    mGLContext->MakeCurrent();
> >-
> >-    // We have to flush to ensure that any buffered GL operations are
> >-    // in the framebuffer before we read.
> >-    mGLContext->fFlush();
> >-
> 
> Why are these going away?

They were moved and modified to the following. I would assert that in the current version of the code, the readPixels is really forcing all gl commands to execute. Since this new path doesn't do a readPixels I changed the flush to a finish to ensure that all command are complete before compositing.

+    mGLContext->MakeCurrent();
+    mGLContext->fFinish();

> 
> >     PRUint32 currentFramebuffer = 0;
> > 
> >     mGLContext->fGetIntegerv(LOCAL_GL_FRAMEBUFFER_BINDING, (GLint*)&currentFramebuffer);
> >@@ -987,10 +995,12 @@
> >     // stick our surface into mSurface, so that the Paint() path is the same
> >     mSurface = isurf;
> >   }
> >+  }
> > 
> >   // sanity
> >   NS_ASSERTION(mUpdatedRect.IsEmpty() || mBounds.Contains(mUpdatedRect),
> >                "CanvasLayer: Updated rect bigger than bounds!");
> >+                             
> > }
> > 
> > void
> >@@ -1009,7 +1019,9 @@
> >   nsRefPtr<gfxPattern> pat = new gfxPattern(mSurface);
> > 
> >   pat->SetFilter(mFilter);
> >-  pat->SetExtend(gfxPattern::EXTEND_PAD);
> >+    
> >+  // Note: EXTEND_NONE can be accelerated by XRender
> >+  pat->SetExtend(gfxPattern::EXTEND_NONE);   
> 
> As discussed, this will need to go.

yes. Understood.

> 
> > 
> >   gfxMatrix m;
> >   if (mNeedsYFlip) {
> >@@ -1018,12 +1030,35 @@
> >     aContext->Scale(1.0, -1.0);
> >   }
> > 
> >+  // If content opaque, then save off current operator and set to source.
> >+  // This ensures that alpha is not applied even if the source surface
> >+  // has an alpha channel
> >+  gfxContext::GraphicsOperator savedOp;
> >+  if (GetContentFlags() & CONTENT_OPAQUE) {
> >+    savedOp = aContext->CurrentOperator();
> >+    aContext->SetOperator(gfxContext::OPERATOR_SOURCE);    
> >+  }
> 
> Is this needed for correctness or performance? I don't think we need it on
> other platforms.

I believe it is needed for correctness. In the Update method, the current readPixels path creates a RBG (No alpha) surface when copying the GL result for an OPAQUE canvas. This eliminates the alpha in the source surface before the composite, so even if the Cairo OP is OVER, no blending occurs. In the new path the source surface has an alpha channel, and if I don't change the OP, some pixels can get double blended. I saw this with the ice blocks in angry birds.

>  
> Overall, there a bunch of #ifdef's that would be nice to avoid, but I'm not
> sure how practical that will be. Also, there a bunch of little cosmetic
> changes, that should be avoided or a separate patch.

 I don't like the #ifdefs either, but I'm not sure how to get around them.

I did see some line break changes in the patch which I'll remove.

> 
> >--- ff_clean/mozilla-2.0/gfx/cairo/cairo/src/cairo-surface-fallback.c	2011-03-18 17:33:40.000000000 -0600
> >+++ ff_clean/mozilla-2.0_applied6-1/gfx/cairo/cairo/src/cairo-surface-fallback.c	2011-05-24 14:45:33.332262767 -0600
> >@@ -723,6 +723,15 @@
> >     if (traps->num_traps > 1 || ! traps->is_rectilinear || ! traps->maybe_region)
> > 	return CAIRO_INT_STATUS_UNSUPPORTED;
> > 
> >+#if defined (MOZ_X11) && defined (MOZ_EGL_XRENDER_COMPOSITE)
> >+  // Round parameters to integers so that we can take the fast path. This can provide
> >+  // a 23X improvement over the software path for mobile XRender devices 
> >+  traps->traps[0].top         = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].top) + .5));
> >+	traps->traps[0].bottom      = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].bottom) + .5));
> >+	traps->traps[0].left.p1.x   = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].left.p1.x) + .5));
> >+	traps->traps[0].right.p1.x  = _cairo_fixed_from_double( (long)(_cairo_fixed_to_double(traps->traps[0].right.p1.x) + .5));
> >+#endif
> >+
> 
> I don't think this is a legitimate thing to do. If we need rounded
> parameters, we should be doing it at higher level.
> Also, _cairo_fixed_round is a better choice.

My first concern is that the rounding is out of spec and will cause some artifacts. I looked at the W3C spec, and don't really see any specifications on filtering. The tests I've done look good and the frame rate is 20X. So in your opinion is it functionally legitimate to round?

If the answer is yes, then I really want to do the rounding only for drawImage. I can put back a minimal #ifdef in nsCanvasRenderingContext2D.cpp at around line 3495 to do the rounding before I call Cairo, but then I'm exposing a cairo xrender/backend limitation way up at the canvas level. Again, maybe some of this should be short lived private patches. I'm open to any suggestions you have.
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2011-06-03 19:15:15 PDT
(In reply to comment #24)
> They were moved and modified to the following. I would assert that in the
> current version of the code, the readPixels is really forcing all gl
> commands to execute. Since this new path doesn't do a readPixels I changed
> the flush to a finish to ensure that all command are complete before
> compositing.
> 
> +    mGLContext->MakeCurrent();
> +    mGLContext->fFinish();

Have you investigated if Flush() might be enough here? Is it the fact that content rendering and compositing run in 2 different processes, that is forcing you to call Finish() instead of Flush() here?
Comment 26 Scott Ruff 2011-06-06 11:37:50 PDT
(In reply to comment #25)
> (In reply to comment #24)
> > They were moved and modified to the following. I would assert that in the
> > current version of the code, the readPixels is really forcing all gl
> > commands to execute. Since this new path doesn't do a readPixels I changed
> > the flush to a finish to ensure that all command are complete before
> > compositing.
> > 
> > +    mGLContext->MakeCurrent();
> > +    mGLContext->fFinish();
> 
> Have you investigated if Flush() might be enough here? Is it the fact that
> content rendering and compositing run in 2 different processes, that is
> forcing you to call Finish() instead of Flush() here?

I'm making these changes to desktop firefox, and I thought it was all one process.

I have experimented with using flush instead of finish and in no cases am I seeing any rendering artifacts. I haven't been able to break any apps. However I think there is a theoretical race condition since flush doesn't ensure that all gl commands have been executed. I think there could still be pending commands in the GL pipe when PaintWithOpacity starts compositing.

There is a noticeable performance difference between flush and finish. Running google's dynamic cube map demo performance is 7-10 fps with the finish and 8-11 fps with the flush. Again this suggests that gl is not done when the composite starts. We might be getting away with it since the composite is starting at the top, and gl is finishing rasterizing the bottom.

For the existing readPixels path we don't need a flush or finish because the readPixels does an implicit finish. So the decision shouldn't affect that path either way.

I'd prefer the performance of the flush, but I'd worry that some combination of app and hardware will expose a bug.
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2011-06-06 12:43:35 PDT
(In reply to comment #26)
> I'm making these changes to desktop firefox, and I thought it was all one
> process.

Oh yes it is, sorry about the confusion. Somehow I thought that was about mobile.

> 
> I have experimented with using flush instead of finish and in no cases am I
> seeing any rendering artifacts. I haven't been able to break any apps.
> However I think there is a theoretical race condition since flush doesn't
> ensure that all gl commands have been executed. I think there could still be
> pending commands in the GL pipe when PaintWithOpacity starts compositing.

My understanding is that while flush() returns before all GL commands have executed, it ensures that all pending GL commands will be executed before any subsequent GL commands that depend their rendering. However, I absolutely don't know if this applies to non-GL commands, such as accessing pixmaps used with texture-from-pixmap.

> I'd prefer the performance of the flush, but I'd worry that some combination
> of app and hardware will expose a bug.

Maybe you can talk to the right people at NVIDIA to check if what you're using for compositing is part of what flush() makes work.
Comment 28 Scott Ruff 2011-06-13 11:57:35 PDT
Created attachment 538971 [details] [diff] [review]
Another update based on comments

To be safe, I do a glFinish in the fast non-readPixels path and leave the flush in the readPixels path.

I put the PAD setting back in BasicCanvasLayer.cpp

I cleaned up some white space changes, and returned a printf that I had accidentally removed.

I added a new LOCAL_EGL_CORE_NATIVE_ENGINE define to GLDefs.h to support eglWaitNative call. Previously I had a hard coded value.

I removed the Cairo rounding from this patch.
Comment 29 Jeff Muizelaar [:jrmuizel] 2011-06-23 13:06:36 PDT
Comment on attachment 538971 [details] [diff] [review]
Another update based on comments

Looks good enough. Sorry for the delay in reviewing.
Comment 30 Scott Ruff 2011-06-23 13:35:30 PDT
Thanks. What is the next step. Should I merge the patch with Mozilla Central top of tree and submit one more patch?
Comment 31 Jeff Muizelaar [:jrmuizel] 2011-06-23 20:26:38 PDT
(In reply to comment #30)
> Thanks. What is the next step. Should I merge the patch with Mozilla Central
> top of tree and submit one more patch?

Yes, please.
Comment 32 Scott Ruff 2011-07-08 06:58:16 PDT
Created attachment 544802 [details] [diff] [review]
Patch merged with top of tree
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2011-07-24 21:04:29 PDT
Jeff: ping?
Comment 34 Jeff Muizelaar [:jrmuizel] 2011-07-26 11:07:57 PDT
Sorry about the delay, I've been heads down on some other things.

Note You need to log in before you can comment on or make changes to this bug.