glFlush call in LayerManagerOGL::Render hurts performance

RESOLVED FIXED

Status

()

Core
Graphics
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: roc, Assigned: Joe Drew (not getting mail))

Tracking

unspecified
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [1 week])

Attachments

(2 attachments, 7 obsolete attachments)

In http://nerget.com/fluidSim/ with MOZ_ACCELERATED=11, we spend a lot of time (11% of real time on my trunk build) blocked in glFlush on the main thread.

It seems that ideally we wouldn't be calling glFlush:
http://developer.apple.com/mac/library/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_designstrategies/opengl_designstrategies.html#//apple_ref/doc/uid/TP40001987-CH2-SW10

If I just comment out the glFlush call, the performance issue goes away (obviously) but there is an unfortunate side effect: nothing gets rendered in the window. I think this is just the Mac window manager not noticing that the window needs to be updated, since if I move the window around the contents will suddenly be updated.

I suspect there's something clever we can do here to fix this.
(In reply to comment #0)
> I suspect there's something clever we can do here to fix this.

Yes -- use a double-buffered window and swap buffers instead of flushing.  The downside of this is that it means that we have to recomposite the entire window for each frame, or that we have to pay the memory cost for a third surface (front + back + FBO) instead of two now (either front + back, or front + FBO).
Recompositing the entire window for each frame should be OK right?
(In reply to comment #1)
> (In reply to comment #0)
> > I suspect there's something clever we can do here to fix this.
> 
> Yes -- use a double-buffered window and swap buffers instead of flushing.  The
> downside of this is that it means that we have to recomposite the entire window
> for each frame, or that we have to pay the memory cost for a third surface
> (front + back + FBO) instead of two now (either front + back, or front + FBO).

That will not simply incur the flush cost on the SwapBuffer call? (Not saying it does, just wondering)
One obvious thing that would work is to move the entire GL composition to its own thread/process. Fortunately cjones is working on that!
(In reply to comment #0)
> It seems that ideally we wouldn't be calling glFlush:

That's what they say, but in all their examples there's a glFlush() (or a [context flushBuffer]) at the end of drawRect:.

> In http://nerget.com/fluidSim/ with MOZ_ACCELERATED=11, we spend a lot of time
> (11% of real time on my trunk build) blocked in glFlush on the main thread.

With what settings and at how many fps? Might this just be the frame rate throttling kicking in?
I was getting < 20 fps due to lack of JM, so it's not fps throttling.
Created attachment 475615 [details] [diff] [review]
use double buffering instead

I think this is what vlad meant. I've seen neither glFlush nor flushBuffer appear in profiles with this patch, so it might speed things up.
if that works as-is, that'd be a neat trick -- the CGL backend doesn't implement SwapBuffers, nor does it implement IsDoubleBuffered, so all you've effectively done there is select a double buffered pixel format and gotten rid of the Flush().

However selecting the double buffered pixel format should have caused nothing to display, since buffers where never swapped...
Not sure why it works, but it does work. Before this patch, the [mGLContext flushBuffer] is actually a no-op (see docs) - that's why it failed to update the window when roc removed the glFlush call. But with double buffering it seems to do exactly what we need.
Ah, I didn't realize that was being called outside of the GL context machinery.  That would do it.  Not having IsDoubleBuffered means that you're effectively triple-buffering, but that's actually more correct for now until we do some fixups on the double buffer path (that path skips the backbuffer FBO, but it means we need to composite everything per redraw).
We should try this change on linux (all 3 major driver vendors) as we had really inconsistent behaviour with double buffering there.
Yeah, I'd like to always use double buffering, as single buffered displays are generally not as well optimized.  Whether we still use our own backbuffer or not is something we can decide after; we could leave it in for now to preserve identical draw semantics, and then do some perf testing to see if it's worth getting rid of.
This sounds great. Can we get a version of Markus' patch that works on Linux as well, or fix it to only skip glFlush on Mac, so we can get something landed for Mac ASAP?
(Assignee)

Comment 14

8 years ago
Created attachment 475887 [details] [diff] [review]
don't flush on mac

How's this look?
Attachment #475615 - Attachment is obsolete: true
Attachment #475887 - Flags: review?(roc)
Comment on attachment 475887 [details] [diff] [review]
don't flush on mac

Looks fine to me, though I don't really like that we're doing some sort of half-double-buffered thing here, with Widget doing the final swap.  But we can fix that later.
Attachment #475887 - Flags: review+
Attachment #475887 - Flags: review?(roc) → review+
Whiteboard: [needs landing]
(Assignee)

Updated

8 years ago
blocking2.0: --- → betaN+
I think this patch is responsible for the intense flickering I've been noticing both in my own builds and in Joe's tryserver build at http://stage.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jdrew@mozilla.com-8e2c9ce3c1d6/tryserver-macosx/firefox-4.0b7pre.en-US.mac.dmg
It looks like this kind of double buffering doesn't play well with partial window repaints. We only paint into a small part of the window, but we swap the whole window buffer, so those parts that haven't been updated during this paint still look like they did two paints ago (when this buffer was last being painted into), so outdated stuff makes its way back to the screen. I guess this is what vlad meant when he said we'd have to recomposite the entire window for each frame in comment 1...
I wonder why I didn't notice this when I first tested this patch. Maybe something changed in the meantime that broke it, but I doubt it.
(In reply to comment #16)
>  I guess this
> is what vlad meant when he said we'd have to recomposite the entire window for
> each frame in comment 1...

That should be easy to do, right?
Seeing the title of this bug, I just thought I'd mention this: a few days ago, while running in OpenGL debug mode (bug 597881) I saw that glFlush was being called on the WRONG GL CONTEXT i.e. without having properly called MakeCurrent. Is this something you know about, or do you want a stack trace leading to this call?
(Assignee)

Comment 19

8 years ago
Created attachment 478169 [details] [diff] [review]
be double buffered for real

This patch makes us actually double-buffered on Mac. Things seem to work nicely.
Attachment #478169 - Flags: review?(vladimir)
(Assignee)

Comment 20

8 years ago
Created attachment 478324 [details] [diff] [review]
don't SwapBuffers on OS X since we flushBuffers in widget code
Attachment #475887 - Attachment is obsolete: true
Attachment #478169 - Attachment is obsolete: true
Attachment #478324 - Flags: review?(vladimir)
Attachment #478169 - Flags: review?(vladimir)
(Assignee)

Updated

8 years ago
Whiteboard: [needs landing] → [needs review vlad]
Comment on attachment 478324 [details] [diff] [review]
don't SwapBuffers on OS X since we flushBuffers in widget code

># HG changeset patch
># User Markus Stange <mstange@themasta.com>
># Date 1285094378 14400
># Node ID a11ae4f21dfb7a746b1bca4083a165c495d4aa66
># Parent 332fc8b14b6824174f1f4f9d0f24b7e02f3ade9c
>Bug 593342 - Use double buffering on Mac instead of flushing, for greater performance. r=vlad,roc a=b
>
>diff --git a/gfx/layers/opengl/ContainerLayerOGL.cpp b/gfx/layers/opengl/ContainerLayerOGL.cpp
>--- a/gfx/layers/opengl/ContainerLayerOGL.cpp
>+++ b/gfx/layers/opengl/ContainerLayerOGL.cpp
>@@ -191,17 +191,17 @@ ContainerLayerOGL::RenderLayer(int aPrev
>     if (clipRect) {
>       scissorRect = *clipRect;
>     }
> 
>     if (needsFramebuffer) {
>       scissorRect.MoveBy(- visibleRect.TopLeft());
>     }
> 
>-    if (aPreviousFrameBuffer == 0) {
>+    if (!needsFramebuffer) {

Use } else { and combine with previous block

>@@ -658,19 +662,21 @@ LayerManagerOGL::Render()
>     DEBUG_GL_ERROR_CHECK(mGLContext);
>   }
> 
>   mGLContext->fDisableVertexAttribArray(vcattr);
>   mGLContext->fDisableVertexAttribArray(tcattr);
> 
>   DEBUG_GL_ERROR_CHECK(mGLContext);
> 
>+#ifndef XP_MACOSX
>   mGLContext->fFlush();
> 
>   DEBUG_GL_ERROR_CHECK(mGLContext);
>+#endif
> }

Don't need this -- this path will never be taken for double buffered contexts, which all mac will be.

Looks fine otherwise.
Attachment #478324 - Flags: review?(vladimir) → review+
SwapBuffers is a nop unless explicitly implemented by the context provider. We should move the flushBuffers call from the cocoa widget into GLContextProviderCGL::SwapBuffers and then leave the call to SwapBuffers in GL.

We can also probably remove the MakeCurrent call in widget, I believe the rendering code handles this.

>-    if (aPreviousFrameBuffer == 0) {
>+    if (!needsFramebuffer) {

This should be if (!needsFramebuffer && gl()->IsDoubleBuffered()) { for it to work without double buffering.

I get worse performance with double buffering enabled on https://bug584494.bugzilla.mozilla.org/attachment.cgi?id=463418 (both fps, and amount of cpu time spent in swapping buffers/flushing), maybe we should add an env var to disable this until we get accurate measurement of the performance?
The aquarium demo is another testcase where I see a lot of time spent inside the flush. Curiously though, enforcing beam sync in Quartz Debug makes that cost go away almost completely and speeds things up.
Maybe we should back out bug 418311?
(Assignee)

Updated

8 years ago
Assignee: nobody → joe
(Assignee)

Updated

8 years ago
Whiteboard: [needs review vlad]
(Assignee)

Comment 24

8 years ago
Created attachment 483991 [details] [diff] [review]
double buffer more correctly, but still not 100% correctly

This patch gets us closer to working, but we still fail reftests, and I'm not sure why. I'm reasonably sure that we're reading back the correct thing (at least most of the time), but we still fail some of the the reftest sanity tests.
(Assignee)

Updated

8 years ago
Whiteboard: [1 week]
Created attachment 486248 [details] [diff] [review]
Double buffer on OSX v3 (I think)

Rebased on top of bug 599507
Added env var to enable/disable double buffering.
Cleaned up cocoa widget code.

This appears to pass reftests fine for me.
Attachment #483991 - Attachment is obsolete: true
Attachment #486248 - Flags: review?(joe)
Created attachment 486300 [details] [diff] [review]
Double buffer on OSX v4

Simplified the scissoring logic further, and added an in depth comment explaining the logic. We've hit issues with scissoring way too many times already :(
Attachment #486248 - Attachment is obsolete: true
Attachment #486300 - Flags: review?(joe)
Attachment #486248 - Flags: review?(joe)
Created attachment 486862 [details] [diff] [review]
Double buffer on OSX v5

Fixed the reftest failures properly. GL has 0,0 at the bottom left and we need to Y-flip the results of glReadPixels to get the image we expect. The old code didn't have this issue because we render upside-down to FBO's.
Attachment #478324 - Attachment is obsolete: true
Attachment #486300 - Attachment is obsolete: true
Attachment #486862 - Flags: review?(joe)
Attachment #486300 - Flags: review?(joe)
I'll run this patch through try to confirm (It certainly works for me) and to see what affect it has on linux once I rid my queue of 602200 woes.
(Assignee)

Comment 29

8 years ago
This still fails several reftests on my computer :/
(Assignee)

Comment 30

8 years ago
Comment on attachment 486862 [details] [diff] [review]
Double buffer on OSX v5

Despite it failing reftests on my computer, if this passes reftests on our minis, we should commit it, I think. We should remove the fDrawBuffer junk that I added for debugging, though!
Attachment #486862 - Flags: review?(joe) → review+
Looks like this only happens on opt builds. Not entirely sure why.

Both data:text/html,<body> tests leave the background transparent instead of white. Linux also seems to be getting a lot of tests failing by 800 pixels (one row only) with transparent instead of white pixels.

Might try this on linux with valgrind to see if anything obvious shows

I've tried changing the clear colour to solid white instead of transparent, no luck. Something is drawing over with transparent pixels somehow.
You could check if the code here http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#6034 is doing it, or check if a nsDisplaySolidColor or nsDisplayCanvasBackground is painting the transparent pixels.
Created attachment 487853 [details] [diff] [review]
Double buffer on OSX v6

Ok, I feel we've wasted enough time on this for the moment.

This patch renders to an FBO when we're going to be doing readback. Not a perfect solution but passes all reftests on try (and only failed two on linux!).

We can file a follow-up to investigate the underlying issue later on if anyone is worried about this hackery.
Attachment #487853 - Flags: review?(joe)
(Assignee)

Comment 34

8 years ago
Comment on attachment 487853 [details] [diff] [review]
Double buffer on OSX v6

I'm not overjoyed about this, since we won't be testing *exactly* the same codepath as we usually use, but we definitely want to be double-buffered.
Attachment #487853 - Flags: review?(joe) → review+
(Assignee)

Comment 35

8 years ago
http://hg.mozilla.org/mozilla-central/rev/16b51dc62bbd
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
(Assignee)

Updated

8 years ago
Blocks: 612400
You need to log in before you can comment on or make changes to this bug.