Closed Bug 675474 Opened 13 years ago Closed 13 years ago

svg/as-image failures with OpenGL layers on linux

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla9

People

(Reporter: mattwoodrow, Unassigned)

References

Details

(Whiteboard: [inbound])

Attachments

(2 files, 3 obsolete files)

Draw all thebes content before compositing 13 years ago Matt Woodrow (:mattwoodrow) 6.45 KB, patch		Details \| Diff \| Splinter Review
Draw all thebes content before compositing v2 13 years ago Matt Woodrow (:mattwoodrow) 11.58 KB, patch	roc : review+ bjacob : feedback+	Details \| Diff \| Splinter Review
Draw all thebes content before compositing v3 13 years ago Matt Woodrow (:mattwoodrow) 11.89 KB, patch	bjacob : review+ mattwoodrow : review+	Details \| Diff \| Splinter Review
Use glXWaitX instead of XSync when binding a pixmap to a texture 13 years ago Matt Woodrow (:mattwoodrow) 4.28 KB, patch	mattwoodrow : review+	Details \| Diff \| Splinter Review
Draw all thebes content before compositing v4 13 years ago Matt Woodrow (:mattwoodrow) 9.79 KB, patch		Details \| Diff \| Splinter Review

Matt Woodrow (:mattwoodrow)

Reporter

Description

•

13 years ago

With OpenGL layers enabled on linux, I'm getting a bunch of intermittent failures in svg/as-image. These are bottom parts of the image and look like some sort of synchronization issue.

Example reftest log: http://tinderbox.mozilla.org/showlog.cgi?log=Try/1311929219.1311931062.4099.gz&fulltext=1

Example failure: http://pastebin.mozilla.org/1286043 (in case the reftest log is deleted)

I can't reproduce any of these on my system.

I've tried all sorts of synchronization tests when reading the GL buffer back to software, including usleep(), XSync() and GLXWaitGL(). We also do an XSync() before binding the X pixmap to a texture.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 1

•

13 years ago

Interestingly adding usleep(100000) into the bind texture call (after XSync) fixes the problem. I also tried adding reftest-wait and 1 second setTimeout, and this had no effect.

What synchronization could we be waiting on, that isn't fixed by XSync() and would have changed with GL?

Is it possible that there is more than one X display, and XSync(DefaultXDisplay(), False) is incorrect?

Marco Castelluccio [:marco]

Comment 2

•

13 years ago

The function DefaultXDisplay (https://mxr.mozilla.org/mozilla-central/source/gfx/src/X11Util.h#66) uses GDK_DISPLAY that is deprecated since 2.22:
http://developer.gnome.org/gdk/stable/gdk-X-Window-System-Interaction.html#GDK-DISPLAY:CAPS

Maybe using this new function would correct this behaviour? Or even better, you could try using XOpenDisplay(NULL), that should work with either GTK or QT.

However maybe using other functions than XSync could be better in terms of performance (http://www.opengl.org/sdk/docs/man/xhtml/glXWaitX.xml or the last lines of http://glprogramming.com/red/appendixg.html)

Matt Woodrow (:mattwoodrow)

Reporter

Comment 3

•

13 years ago

Thanks Marco. I'm currently more worried about correctness than performance though.

As you correctly point out, XSync() should be overkill for this operation (and something that we should change once this bug is sorted). Instead I'm needing to manually sleep to get the correct synchronization.

I can think of two possible causes:

- The surface's XDisplay() is different to that of DefaultXDisplay(), and we're syncing the wrong display connection. (Does this even make a difference?) I'll run a tryserver build soon to test this possibility.

- These svg/as-image tests are waiting on something separate. Are svg data urls decoded in a separate thread? Is it possible that by requiring that surface is reading slightly earlier (at thebes layer render time, instead of XSync'ing at the end of compositing) that we are just attempting to draw before the decode is done?

Any other possible ideas would be appreciated, I am very much guessing here.

Marco Castelluccio [:marco]

Comment 4

•

13 years ago

The problem of the wrong display sounds strange, I think this isn't the cause of the failed reftest.

It could be a synchronization problem, above all if SVGs are drawn in separate threads with OpenGL. In this case infact I think XSync isn't enough for synchronization. I think the best thing to do is to have a unique context and use a main thread to dispatch OpenGL commands.

http://developer.amd.com/gpu_assets/GDC2005_OpenGL_Performance.pdf - From page 33 to page 37.

http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html

http://abral.altervista.org/code/?c=glthreads - This is an example from Mesa of multiple rendering threads.

These three links are useful only if SVGs are drawn in different threads with OpenGL. In this case we can also try to disable OpenGL acceleration from Cairo (that, I suppose, draws SVGs) and see what happens.

Have you tried also glXWaitX() or glFlush() or glFinish() instead of XSync()? glFinish() should wait until every OpenGL command is complete, so this should be the slowest but should guarantee strict synchronization.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 5

•

13 years ago

We only have compositing happening with OpenGL. All SVG drawing will be happening in software, through normal X API's.

Indeed the X Display thing appears to not be an issue.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 6

•

13 years ago

Attached patch Draw all thebes content before compositing (obsolete) — Details — Splinter Review

I've tried glXWaitX, and this patch to draw all thebes content before compositing.

This should improve performance (reduce the number of synchronization points we need), and I was hoping it would reduce the problem by delaying our bind texture calls.

The problems persist on tryserver and I've run out of ideas at this point.

Attachment #553675 - Flags: feedback?(bjacob)

Matt Woodrow (:mattwoodrow)

Reporter

Comment 7

•

13 years ago

Attached patch Draw all thebes content before compositing v2 (obsolete) — Details — Splinter Review

Hrm, I may have mixed up my tryserver emails. Looks like I didn't get any failures with glXWaitX.

Benoit: Can you test this to see if it fixes the problem you're seeing in bug 678134?

I'll test more thoroughly with tryserver when the trees reopen.

Attachment #553675 - Attachment is obsolete: true

Attachment #553675 - Flags: feedback?(bjacob)

Attachment #553688 - Flags: feedback?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Comment 8

•

13 years ago

Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Had to apply the patch from bug 675532 first.

This patch fixes bug 678134 for me! \o/

Attachment #553688 - Flags: feedback?(bjacob) → feedback+

Benoit Jacob [:bjacob] (mostly away)

Updated

•

13 years ago

Blocks: 678134

Matt Woodrow (:mattwoodrow)

Reporter

Comment 9

•

13 years ago

Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Tryserver looks good! Hooray

Attachment #553688 - Flags: review?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Comment 10

•

13 years ago

Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Adding :roc for reviewing the Layers bits.

>   mGLContext->fClearColor(0.0, 0.0, 0.0, 0.0);
>   mGLContext->fClear(LOCAL_GL_COLOR_BUFFER_BIT | LOCAL_GL_DEPTH_BUFFER_BIT);
> 

This isn't part of your patch, just context, but I would like to know: the mGLContext here, could it ever be the GL context underlying a WebGL context? Like, in the case of a CanvasLayer corresponding to a WebGL context? I ask because, in this case, we would really need to call GLContext::ClearSafely() here instead of the above naive clearing code.

>-  // Drawing thebes layers can change the current context, reset it.
>+void
>+ThebesLayerOGL::RenderLayer(int aPreviousFrameBuffer,
>+                            const nsIntPoint& aOffset)
>+{
>+  NS_ABORT_IF_FALSE(mBuffer, "should have a buffer here");
>+  
>+  gfxMatrix transform2d;
>+  PRUint32 flags = 0;
>+  if (GetEffectiveTransform().Is2D(&transform2d)) {
>+    if (transform2d.HasNonIntegerTranslation()) {
>+      flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
>+    }
>+  } else {
>+    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
>+  }
>+
>   gl()->MakeCurrent();
>+  gl()->fActiveTexture(LOCAL_GL_TEXTURE0);

Why is a new fActiveTexture call needed here? If it wasn't needed before.

> 
>     PRBool SwapBuffers()
>     {
>         if (!mDoubleBuffered)
>             return PR_FALSE;
>         sGLXLibrary.xSwapBuffers(mDisplay, mDrawable);
>+        fFinish();
>         sGLXLibrary.xWaitGL();
>         return PR_TRUE;
>     }

Why is the finish() needed here? Also, have you measured the impact on performance?

There are GL contexts that never interfere with XRender rendering, such as WebGL contexts. Why should they pay the price of having a finish() at each swapBuffer()?

Attachment #553688 - Flags: review?(roc)

Benoit Jacob [:bjacob] (mostly away)

Comment 11

•

13 years ago

> There are GL contexts that never interfere with XRender rendering, such as
> WebGL contexts. Why should they pay the price of having a finish() at each
> swapBuffer()?

Hm, I guess this was a stupid question as WebGL contexts render to a single framebuffer object so they're not concerned with SwapBuffer. Ignore that.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

13 years ago

Comment on attachment 553688 [details] [diff] [review]
Draw all thebes content before compositing v2

Review of attachment 553688 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/ThebesLayerOGL.cpp
@@ +729,5 @@
> +      flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
> +    }
> +  } else {
> +    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
> +  }

Change both occurrences of this to the simpler
  if (!GetEffectiveTransform().Is2D(&transform2d) ||
      transform2d.HasNonIntegerTranslation()) {
    flags |= ThebesLayerBufferOGL::PAINT_WILL_RESAMPLE;
  }

... or better still, add gfx3DMatrix::HasNonIntegerTranslation() and call that.

Attachment #553688 - Flags: review?(roc) → review+

Matt Woodrow (:mattwoodrow)

Reporter

Comment 13

•

13 years ago

> This isn't part of your patch, just context, but I would like to know: the
> mGLContext here, could it ever be the GL context underlying a WebGL context?
> Like, in the case of a CanvasLayer corresponding to a WebGL context? I ask
> because, in this case, we would really need to call GLContext::ClearSafely()
> here instead of the above naive clearing code.

No it can't, |this| is a LayerManagerOGL, so mGLContext is always a context wrapping a window.

> 
> Why is a new fActiveTexture call needed here? If it wasn't needed before.

It was always there, just before the Thebes drawing code.

> Why is the finish() needed here? Also, have you measured the impact on
> performance?
> 
> There are GL contexts that never interfere with XRender rendering, such as
> WebGL contexts. Why should they pay the price of having a finish() at each
> swapBuffer()?

This is not intentionally part of the patch, will remove it :)

Matt Woodrow (:mattwoodrow)

Reporter

Comment 14

•

13 years ago

Attached patch Draw all thebes content before compositing v3 (obsolete) — Details — Splinter Review

Removed fFinish call, and tidied the resampling check.

Note that I didn't implement gfx3DMatrix::HasNonIntegerTranslation for the sake of my 3d transforms patches. I'm happy to add this later if you like.

Attachment #554284 - Flags: review?(bjacob)

Matt Woodrow (:mattwoodrow)

Reporter

Updated

•

13 years ago

Attachment #553688 - Attachment is obsolete: true

Attachment #553688 - Flags: review?(bjacob)

Matt Woodrow (:mattwoodrow)

Reporter

Comment 15

•

13 years ago

Comment on attachment 554284 [details] [diff] [review]
Draw all thebes content before compositing v3

Carrying forward r=roc

Attachment #554284 - Flags: review+

Benoit Jacob [:bjacob] (mostly away)

Updated

•

13 years ago

Attachment #554284 - Flags: review?(bjacob) → review+

Matt Woodrow (:mattwoodrow)

Reporter

Comment 16

•

13 years ago

As a side note, it's interesting that swapping XSync for glXWaitX fixes this bug. The texture_from_pixmap spec (http://www.opengl.org/registry/specs/EXT/texture_from_pixmap.txt) says that either can be used to synchronize before binding a texture.

It appears that XSync is flushing the X command pipeline, but not waiting for the (assumedly OpenGL) XRender pipeline to complete.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 17

•

13 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/58147380793b

Whiteboard: [inbound]

Ed Morley [:emorley]

Comment 18

•

13 years ago

I believe one of the changesets in that push (bug 594876, bug 675474 or bug 675532) has caused an OSX 10.6 reftest perma-orange, so needs backing out:
http://tbpl.allizom.org/?tree=Mozilla-Inbound&usebuildbot=1&rev=0a920411e64c

Justin Wood (:Callek)

Comment 19

•

13 years ago

Backed out on inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/7797172fc164
http://hg.mozilla.org/integration/mozilla-inbound/rev/955f83dc4372

Whiteboard: [inbound]

Dão Gottwald [:dao]

Comment 20

•

13 years ago

Note that Talos reported massive perf regressions on Linux for the push this was part.

Matt Woodrow (:mattwoodrow)

Reporter

Comment 21

•

13 years ago

Attached patch Use glXWaitX instead of XSync when binding a pixmap to a texture — Details — Splinter Review

Looks like this patch also broke some tests on mac.

Splitting this into the glXWaitX() part which is needed for GLX, and the Validate() cycle which is an optimization for all platforms.

Carrying forward r=bjacob

Attachment #554284 - Attachment is obsolete: true

Attachment #555276 - Flags: review+

Matt Woodrow (:mattwoodrow)

Reporter

Comment 22

•

13 years ago

Attached patch Draw all thebes content before compositing v4 — Details — Splinter Review

Matt Woodrow (:mattwoodrow)

Reporter

Comment 23

•

13 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/99c5e7b68207


Landed the first part on inbound

Whiteboard: [inbound]

Marco Bonardo [:mak]

Comment 24

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/99c5e7b68207

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla9

Karl Tomlinson (:karlt)

Updated

•

12 years ago

Depends on: 687831

You need to log in before you can comment on or make changes to this bug.