Last Comment Bug 675474 - svg/as-image failures with OpenGL layers on linux
: svg/as-image failures with OpenGL layers on linux
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: unspecified
: x86 Linux
: -- normal (vote)
: mozilla9
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 687831
Blocks: ogl-linux-beta 678134
  Show dependency treegraph
 
Reported: 2011-07-30 21:03 PDT by Matt Woodrow (:mattwoodrow)
Modified: 2012-04-23 19:23 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Draw all thebes content before compositing (6.45 KB, patch)
2011-08-16 21:33 PDT, Matt Woodrow (:mattwoodrow)
no flags Details | Diff | Review
Draw all thebes content before compositing v2 (11.58 KB, patch)
2011-08-16 23:32 PDT, Matt Woodrow (:mattwoodrow)
roc: review+
jacob.benoit.1: feedback+
Details | Diff | Review
Draw all thebes content before compositing v3 (11.89 KB, patch)
2011-08-18 18:32 PDT, Matt Woodrow (:mattwoodrow)
jacob.benoit.1: review+
matt.woodrow: review+
Details | Diff | Review
Use glXWaitX instead of XSync when binding a pixmap to a texture (4.28 KB, patch)
2011-08-23 18:51 PDT, Matt Woodrow (:mattwoodrow)
matt.woodrow: review+
Details | Diff | Review
Draw all thebes content before compositing v4 (9.79 KB, patch)
2011-08-23 18:51 PDT, Matt Woodrow (:mattwoodrow)
no flags Details | Diff | Review

Description Matt Woodrow (:mattwoodrow) 2011-07-30 21:03:24 PDT
With OpenGL layers enabled on linux, I'm getting a bunch of intermittent failures in svg/as-image. These are bottom parts of the image and look like some sort of synchronization issue.

Example reftest log: http://tinderbox.mozilla.org/showlog.cgi?log=Try/1311929219.1311931062.4099.gz&fulltext=1

Example failure: http://pastebin.mozilla.org/1286043 (in case the reftest log is deleted)

I can't reproduce any of these on my system.

I've tried all sorts of synchronization tests when reading the GL buffer back to software, including usleep(), XSync() and GLXWaitGL(). We also do an XSync() before binding the X pixmap to a texture.
Comment 1 Matt Woodrow (:mattwoodrow) 2011-08-05 00:19:47 PDT
Interestingly adding usleep(100000) into the bind texture call (after XSync) fixes the problem. I also tried adding reftest-wait and 1 second setTimeout, and this had no effect.

What synchronization could we be waiting on, that isn't fixed by XSync() and would have changed with GL?

Is it possible that there is more than one X display, and XSync(DefaultXDisplay(), False) is incorrect?
Comment 2 Marco Castelluccio [:marco] 2011-08-05 11:32:45 PDT
The function DefaultXDisplay (https://mxr.mozilla.org/mozilla-central/source/gfx/src/X11Util.h#66) uses GDK_DISPLAY that is deprecated since 2.22:
http://developer.gnome.org/gdk/stable/gdk-X-Window-System-Interaction.html#GDK-DISPLAY:CAPS

Maybe using this new function would correct this behaviour? Or even better, you could try using XOpenDisplay(NULL), that should work with either GTK or QT.

However maybe using other functions than XSync could be better in terms of performance (http://www.opengl.org/sdk/docs/man/xhtml/glXWaitX.xml or the last lines of http://glprogramming.com/red/appendixg.html)
Comment 3 Matt Woodrow (:mattwoodrow) 2011-08-06 03:09:08 PDT
Thanks Marco. I'm currently more worried about correctness than performance though.

As you correctly point out, XSync() should be overkill for this operation (and something that we should change once this bug is sorted). Instead I'm needing to manually sleep to get the correct synchronization.

I can think of two possible causes:

- The surface's XDisplay() is different to that of DefaultXDisplay(), and we're syncing the wrong display connection. (Does this even make a difference?) I'll run a tryserver build soon to test this possibility.

- These svg/as-image tests are waiting on something separate. Are svg data urls decoded in a separate thread? Is it possible that by requiring that surface is reading slightly earlier (at thebes layer render time, instead of XSync'ing at the end of compositing) that we are just attempting to draw before the decode is done?

Any other possible ideas would be appreciated, I am very much guessing here.
Comment 4 Marco Castelluccio [:marco] 2011-08-06 06:21:29 PDT
The problem of the wrong display sounds strange, I think this isn't the cause of the failed reftest.

It could be a synchronization problem, above all if SVGs are drawn in separate threads with OpenGL. In this case infact I think XSync isn't enough for synchronization. I think the best thing to do is to have a unique context and use a main thread to dispatch OpenGL commands.

http://developer.amd.com/gpu_assets/GDC2005_OpenGL_Performance.pdf - From page 33 to page 37.

http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html

http://abral.altervista.org/code/?c=glthreads - This is an example from Mesa of multiple rendering threads.

These three links are useful only if SVGs are drawn in different threads with OpenGL. In this case we can also try to disable OpenGL acceleration from Cairo (that, I suppose, draws SVGs) and see what happens.

Have you tried also glXWaitX() or glFlush() or glFinish() instead of XSync()? glFinish() should wait until every OpenGL command is complete, so this should be the slowest but should guarantee strict synchronization.
Comment 5 Matt Woodrow (:mattwoodrow) 2011-08-10 19:03:21 PDT
We only have compositing happening with OpenGL. All SVG drawing will be happening in software, through normal X API's.

Indeed the X Display thing appears to not be an issue.
Comment 6 Matt Woodrow (:mattwoodrow) 2011-08-16 21:33:31 PDT
Created attachment 553675 [details] [diff] [review]
Draw all thebes content before compositing

I've tried glXWaitX, and this patch to draw all thebes content before compositing.

This should improve performance (reduce the number of synchronization points we need), and I was hoping it would reduce the problem by delaying our bind texture calls.

The problems persist on tryserver and I've run out of ideas at this point.
Comment 7 Matt Woodrow (:mattwoodrow) 2011-08-16 23:32:46 PDT
Created attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Hrm, I may have mixed up my tryserver emails. Looks like I didn't get any failures with glXWaitX.

Benoit: Can you test this to see if it fixes the problem you're seeing in bug 678134?

I'll test more thoroughly with tryserver when the trees reopen.
Comment 8 Benoit Jacob [:bjacob] (mostly away) 2011-08-17 11:57:53 PDT
Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Had to apply the patch from bug 675532 first.

This patch fixes bug 678134 for me! \o/
Comment 9 Matt Woodrow (:mattwoodrow) 2011-08-18 01:36:13 PDT
Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Tryserver looks good! Hooray
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2011-08-18 15:04:13 PDT
Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Adding :roc for reviewing the Layers bits.

>   mGLContext->fClearColor(0.0, 0.0, 0.0, 0.0);
>   mGLContext->fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT);
> 

This isn't part of your patch, just context, but I would like to know: the mGLContext here, could it ever be the GL context underlying a WebGL context? Like, in the case of a CanvasLayer corresponding to a WebGL context? I ask because, in this case, we would really need to call GLContext::ClearSafely() here instead of the above naive clearing code.

>-  // Drawing thebes layers can change the current context, reset it.
>+void
>+ThebesLayerOGL::RenderLayer(int aPreviousFrameBuffer,
>+                            const nsIntPoint& aOffset)
>+{
>+  NS_ABORT_IF_FALSE(mBuffer, "should have a buffer here");
>+  
>+  gfxMatrix transform2d;
>+  PRUint32 flags = 0;
>+  if (GetEffectiveTransform().Is2D(&transform2d)) {
>+    if (transform2d.HasNonIntegerTranslation()) {
>+      flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
>+    }
>+  } else {
>+    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
>+  }
>+
>   gl()->MakeCurrent();
>+  gl()->fActiveTexture(LOCAL_GL_TEXTURE0);

Why is a new fActiveTexture call needed here? If it wasn't needed before.

> 
>     PRBool SwapBuffers()
>     {
>         if (!mDoubleBuffered)
>             return PR_FALSE;
>         sGLXLibrary.xSwapBuffers(mDisplay, mDrawable);
>+        fFinish();
>         sGLXLibrary.xWaitGL();
>         return PR_TRUE;
>     }

Why is the finish() needed here? Also, have you measured the impact on performance?

There are GL contexts that never interfere with XRender rendering, such as WebGL contexts. Why should they pay the price of having a finish() at each swapBuffer()?
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-08-18 15:06:59 PDT
> There are GL contexts that never interfere with XRender rendering, such as
> WebGL contexts. Why should they pay the price of having a finish() at each
> swapBuffer()?

Hm, I guess this was a stupid question as WebGL contexts render to a single framebuffer object so they're not concerned with SwapBuffer. Ignore that.
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-08-18 17:02:13 PDT
Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Review of attachment 553688 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/ThebesLayerOGL.cpp
@@ +729,5 @@
> +      flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
> +    }
> +  } else {
> +    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
> +  }

Change both occurrences of this to the simpler
  if (!GetEffectiveTransform().Is2D(&transform2d) ||
      transform2d.HasNonIntegerTranslation()) {
    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
  }

... or better still, add gfx3DMatrix::HasNonIntegerTranslation() and call that.
Comment 13 Matt Woodrow (:mattwoodrow) 2011-08-18 18:22:48 PDT
> This isn't part of your patch, just context, but I would like to know: the
> mGLContext here, could it ever be the GL context underlying a WebGL context?
> Like, in the case of a CanvasLayer corresponding to a WebGL context? I ask
> because, in this case, we would really need to call GLContext::ClearSafely()
> here instead of the above naive clearing code.

No it can't, |this| is a LayerManagerOGL, so mGLContext is always a context wrapping a window.

> 
> Why is a new fActiveTexture call needed here? If it wasn't needed before.

It was always there, just before the Thebes drawing code.

> Why is the finish() needed here? Also, have you measured the impact on
> performance?
> 
> There are GL contexts that never interfere with XRender rendering, such as
> WebGL contexts. Why should they pay the price of having a finish() at each
> swapBuffer()?

This is not intentionally part of the patch, will remove it :)
Comment 14 Matt Woodrow (:mattwoodrow) 2011-08-18 18:32:40 PDT
Created attachment 554284 [details] [diff] [review]
Draw all thebes content before compositing v3

Removed fFinish call, and tidied the resampling check.

Note that I didn't implement gfx3DMatrix::HasNonIntegerTranslation for the sake of my 3d transforms patches. I'm happy to add this later if you like.
Comment 15 Matt Woodrow (:mattwoodrow) 2011-08-18 18:33:25 PDT
Comment on attachment 554284 [details] [diff] [review]
Draw all thebes content before compositing v3

Carrying forward r=roc
Comment 16 Matt Woodrow (:mattwoodrow) 2011-08-19 18:56:22 PDT
As a side note, it's interesting that swapping XSync for glXWaitX fixes this bug. The texture_from_pixmap spec (http://www.opengl.org/registry/specs/EXT/texture_from_pixmap.txt) says that either can be used to synchronize before binding a texture.

It appears that XSync is flushing the X command pipeline, but not waiting for the (assumedly OpenGL) XRender pipeline to complete.
Comment 17 Matt Woodrow (:mattwoodrow) 2011-08-19 19:06:40 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/58147380793b
Comment 18 Ed Morley [:emorley] 2011-08-20 05:54:57 PDT
I believe one of the changesets in that push (bug 594876, bug 675474 or bug 675532) has caused an OSX 10.6 reftest perma-orange, so needs backing out:
http://tbpl.allizom.org/?tree=Mozilla-Inbound&usebuildbot=1&rev=0a920411e64c
Comment 20 Dão Gottwald [:dao] 2011-08-22 05:42:09 PDT
Note that Talos reported massive perf regressions on Linux for the push this was part.
Comment 21 Matt Woodrow (:mattwoodrow) 2011-08-23 18:51:17 PDT
Created attachment 555276 [details] [diff] [review]
Use glXWaitX instead of XSync when binding a pixmap to a texture

Looks like this patch also broke some tests on mac.

Splitting this into the glXWaitX() part which is needed for GLX, and the Validate() cycle which is an optimization for all platforms.

Carrying forward r=bjacob
Comment 22 Matt Woodrow (:mattwoodrow) 2011-08-23 18:51:48 PDT
Created attachment 555277 [details] [diff] [review]
Draw all thebes content before compositing v4
Comment 23 Matt Woodrow (:mattwoodrow) 2011-08-23 19:23:10 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/99c5e7b68207


Landed the first part on inbound
Comment 24 Marco Bonardo [::mak] 2011-08-24 01:54:39 PDT
http://hg.mozilla.org/mozilla-central/rev/99c5e7b68207

Note You need to log in before you can comment on or make changes to this bug.