Closed Bug 697831 Opened 13 years ago Closed 12 years ago

Accelerated layers should not use glFinish to synchronize textures across contexts

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
blocking-basecamp -

People

(Reporter: jgilbert, Assigned: jgilbert)

References

Details

(Keywords: perf, regression)

Attachments

(1 file, 2 obsolete files)

For synchronizing textures across contexts, we should not be using these commands:

glFlush:
Only guarantees that the commands in the command buffer will be sent to the GPU in finite time. That is, it flushes the command buffer to the GPU, instead of possibly waiting for more commands. It does not guarantee *when* the commands will be completed, but it also does not block.

glFinish:
Flushes all commands to the GPU, and blocks until the commands are all 'fully realized'. This has to block until the GPU finishes, sends an indication to the CPU that rendering is complete, and which finally releases the block. This is slow because we not only wait for everything to complete rendering (what we want), but we also have to wait for reception of the notification of such. (slow)

As it stands now, we must use glFinish to assure rendering is completed, but this comes at a performance penalty. It should be noted that glFinish is not required for glReadPixels, or any of the GPU-local commands.

What we need to use is a method that acts as a GPU-local blocker for commands that depend rendering to an object in another context being completed, without having to interact with the CPU again.

This functionality is added by ARB_sync: http://www.opengl.org/registry/specs/ARB/sync.txt
Needs testing on non-windows platforms, since windows doesn't support OGL layers.
Assignee: nobody → jgilbert
Status: NEW → ASSIGNED
Attachment #570872 - Flags: feedback?(bjacob)
Comment on attachment 570872 [details] [diff] [review]
Implements ARB_sync to allow for GPU-local signalling on OGL layers

Yeah, this doesn't work yet.
Attachment #570872 - Flags: feedback?(bjacob)
At least, not on my mac machine, that is.
glFlush considered relatively harmless, but we really should not use glFinish.
Summary: Accelerated layers should not use glFlush or glFinish to synchronize textures across contexts → Accelerated layers should not use glFinish to synchronize textures across contexts
Target Milestone: --- → mozilla10
Version: unspecified → Trunk
Target Milestone: mozilla10 → ---
Attachment #575375 - Flags: review?(bjacob)
Comment on attachment 575375 [details] [diff] [review]
Remove unnecessary glFlush/glFinish from GLContext

Review of attachment 575375 [details] [diff] [review]:
-----------------------------------------------------------------

I just need a little explanation:

::: gfx/thebes/GLContext.h
@@ +1004,5 @@
>  
> +    void ReadyOffscreenTexture() {
> +        BlitDirtyFBOs();
> +        fFlush();
> +    }

I don't understand, didn't we say that glFlush doesn't really do anything useful for us?
Attachment #575375 - Flags: review?(bjacob) → review-
Bas had a good point that if glFlush is sufficient to get the context's commands to the hardware's serial buffer, we can guarantee that even commands across different contexts are executed in the order we want, and that the contents of the backing shared texture will be complete and valid for the layers context.

Unfortunately, upon re-reading the spec, it explicitly does not guarantee this. The main relevant section is D.3.3, specifically Rule 2:

While a container object C is bound, any changes made to the contents of
C’s attachments in the current context are guaranteed to be seen. To guarantee
seeing changes made in another context to objects attached to C, such changes
must be completed in that other context (see section D.3.1) prior to C being bound.
Changes made in another context but not determined to have completed as described
in section D.3.1, or after C is bound in the current context, are not guaranteed
to be seen.

Section D.3.1 reads:
The contents of an object T are considered to have been changed once a command
such as described in section D.3 has completed. Completion of a command 1 may
be determined either by calling Finish, or by calling FenceSync and executing a
WaitSync command on the associated sync object. The second method does not
require a round trip to the GL server and may be more efficient, particularly when
changes to T in one context must be known to have completed before executing
commands dependent on those changes in another context.

This wording is from OpenGL 4.2 spec, but OpenGL ES 2.0 spec has similar wording, though omitting fence/sync objects.

As such, our earlier (but not earliest) understanding was correct: We cannot rely on glFlush being sufficient for all implementations, though some implementations may guarantee that it can. Specifically, on ANGLE, glFlush uses a similar function in D3D, which Bas has told me *does* guarantee in-order executions of commands, though we should probably double-check. A number of texts relating to Apple's OGL implementations leads me to believe this is true for them as well.
Basically, we can use glFlush if the platform's implementation explicitly supports it, but otherwise, as per spec, we must use either glFinish or ARB_sync.

A relevant link:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=279182
(In reply to Jeff Gilbert [:jgilbert] from comment #7)
> As such, our earlier (but not earliest) understanding was correct: We cannot
> rely on glFlush being sufficient for all implementations, though some
> implementations may guarantee that it can. Specifically, on ANGLE, glFlush
> uses a similar function in D3D, which Bas has told me *does* guarantee
> in-order executions of commands, though we should probably double-check. A
> number of texts relating to Apple's OGL implementations leads me to believe
> this is true for them as well.

Just FYI, this isn't completely true, D3D9 does not have a real flush, the kind of flush it does have is more like a 'glFinish' (using a GPU Query and waiting for it to have executed). D3D10 -does- have a real Flush, which guarantees in-order execution with different D3D10 devices, but -not- across D3D10 and D3D9 devices.
I made a wiki page for this glFlush/glFinish stuff: https://wiki.mozilla.org/Platform/GFX/CrossContextSyncing
Depends on: 705024
Blocks: omtc
Here's a profiling showing just how bad it is hurting us on:
https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/demos/google/nvidia-vertex-buffer-object/index.html

varium.fantasytalesonline.com/cleopatra/?report=AMIfv95h5nCbkSXxz9W_T9bUQFhogzwtYfpHL8PK8djmoDyaT7Z-ITkNNmA5ME7eQgIRb5IJsG2A3JVlDZCtwKfurHjN6bb7C0raCVRXClfoEylTFOmKv-JibmzVvdtb8MybnMUWdxguqNSzBYKVQobJYobUQAJ_SQ

We're spending 31% of our time waiting on glFinish.
Depends on: 721115
This doesn't need to block beta for Fennec unless we are specifically hurt worse there than on other OSes. We will get this in good time with ARB_sync etc.
Depends on: 739421
Attachment #575375 - Attachment is obsolete: true
Attached patch Proof of conceptSplinter Review
Here's a proof-of-concept for OGL layers which can be switch between Finish, ClientWaitSync, and WaitSync for resolving.

Note that I don't think we can get away with using (Server)WaitSync since it doesn't guarantee anything, at least in its current form.
Attachment #570872 - Attachment is obsolete: true
Attachment #616359 - Flags: feedback?(cbrocious)
Attachment #616359 - Flags: feedback?(bjacob)
Comment on attachment 616359 [details] [diff] [review]
Proof of concept

Review of attachment 616359 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.cpp
@@ +461,5 @@
>          fGetIntegerv(LOCAL_GL_MAX_RENDERBUFFER_SIZE, &mMaxRenderbufferSize);
>  
> +        if (IsExtensionSupported(ARB_sync)) {
> +            mUseClientWaitSync = Preferences::GetBool("gl.use-client-wait", true);
> +            mUseServerWaitSync = Preferences::GetBool("gl.use-server-wait", false);

With OMTC, we're now creating GL contexts off the main thread, so we can't access prefs from here anymore. To work around this problem, I've been extending gfxPlatform. See bug 686735

::: gfx/gl/GLContext.h
@@ +1104,5 @@
> +    bool mUseServerWaitSync;
> +
> +    void FenceCleanSync() {
> +        if (!mUseClientWaitSync && !mUseServerWaitSync) {
> +            printf_stderr("No sync needed, lol\n");

Remove printfs. This also seems like it would be too verbose even for debug builds. Feel free to implement a gfx.verbose mode where you could go crazy with printfs.

@@ +1111,5 @@
> +
> +        if (mCleanSync)
> +            fDeleteSync(mCleanSync);
> +
> +        mCleanSync = fFenceSync(LOCAL_GL_SYNC_GPU_COMMANDS_COMPLETE, 0);

I don't feel competent enough right now to review code that uses FenceSync (the previous patch was just _exposing_ it). If you can't find someone else easily, I'll read the specs. let me know.
Attachment #616359 - Flags: feedback?(bjacob) → feedback-
No longer blocks: omtc
Blocks: omtc
Comment on attachment 616359 [details] [diff] [review]
Proof of concept

Agree with what bjacob said. In addition, this seems to have no fallback in the case that syncs aren't in use in WaitCleanSync.  Also, do we have information on how often server syncs are actually server syncs rather than just happening on the client?
Attachment #616359 - Flags: feedback?(cbrocious) → feedback-
blocking-basecamp: --- → ?
We're OK with what we're currently doing for B2G.
blocking-basecamp: ? → -
We fixed this in various places for different platforms. It's hard to dupe this against anything in particular, but this is effectively done.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: