Closed Bug 697831 Opened 13 years ago Closed 12 years ago

Accelerated layers should not use glFinish to synchronize textures across contexts

Tracking

()

Status:

RESOLVED WORKSFORME

Project Flags:

blocking-basecamp

People

(Reporter: jgilbert, Assigned: jgilbert)

References

Details

(Keywords: perf, regression)

Attachments

(1 file, 2 obsolete files)

Implements ARB_sync to allow for GPU-local signalling on OGL layers 13 years ago Kelsey Gilbert [:jgilbert] 9.66 KB, patch		Details \| Diff \| Splinter Review
Remove unnecessary glFlush/glFinish from GLContext 13 years ago Kelsey Gilbert [:jgilbert] 5.91 KB, patch	bjacob : review-	Details \| Diff \| Splinter Review
Proof of concept 12 years ago Kelsey Gilbert [:jgilbert] 8.67 KB, patch	Daeken : feedback- bjacob : feedback-	Details \| Diff \| Splinter Review

Kelsey Gilbert [:jgilbert]

Assignee

Description

•

13 years ago

For synchronizing textures across contexts, we should not be using these commands:

glFlush:
Only guarantees that the commands in the command buffer will be sent to the GPU in finite time. That is, it flushes the command buffer to the GPU, instead of possibly waiting for more commands. It does not guarantee *when* the commands will be completed, but it also does not block.

glFinish:
Flushes all commands to the GPU, and blocks until the commands are all 'fully realized'. This has to block until the GPU finishes, sends an indication to the CPU that rendering is complete, and which finally releases the block. This is slow because we not only wait for everything to complete rendering (what we want), but we also have to wait for reception of the notification of such. (slow)

As it stands now, we must use glFinish to assure rendering is completed, but this comes at a performance penalty. It should be noted that glFinish is not required for glReadPixels, or any of the GPU-local commands.

What we need to use is a method that acts as a GPU-local blocker for commands that depend rendering to an object in another context being completed, without having to interact with the CPU again.

This functionality is added by ARB_sync: http://www.opengl.org/registry/specs/ARB/sync.txt

Kelsey Gilbert [:jgilbert]

Assignee

Comment 1

•

13 years ago

Attached patch Implements ARB_sync to allow for GPU-local signalling on OGL layers (obsolete) — Details — Splinter Review

Needs testing on non-windows platforms, since windows doesn't support OGL layers.

Assignee: nobody → jgilbert

Status: NEW → ASSIGNED

Attachment #570872 - Flags: feedback?(bjacob)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 2

•

13 years ago

Comment on attachment 570872 [details] [diff] [review]
Implements ARB_sync to allow for GPU-local signalling on OGL layers

Yeah, this doesn't work yet.

Attachment #570872 - Flags: feedback?(bjacob)

Kelsey Gilbert [:jgilbert]

Assignee

Comment 3

•

13 years ago

At least, not on my mac machine, that is.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 4

•

13 years ago

glFlush considered relatively harmless, but we really should not use glFinish.

Summary: Accelerated layers should not use glFlush or glFinish to synchronize textures across contexts → Accelerated layers should not use glFinish to synchronize textures across contexts

Target Milestone: --- → mozilla10

Version: unspecified → Trunk

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

13 years ago

Target Milestone: mozilla10 → ---

Kelsey Gilbert [:jgilbert]

Assignee

Comment 5

•

13 years ago

Attached patch Remove unnecessary glFlush/glFinish from GLContext (obsolete) — Details — Splinter Review

Attachment #575375 - Flags: review?(bjacob)

Dan Mosedale (:dmosedale, :dmose)

Updated

•

13 years ago

Keywords: perf, regression

Benoit Jacob [:bjacob] (mostly away)

Comment 6

•

13 years ago

Comment on attachment 575375 [details] [diff] [review]
Remove unnecessary glFlush/glFinish from GLContext

Review of attachment 575375 [details] [diff] [review]:
-----------------------------------------------------------------

I just need a little explanation:

::: gfx/thebes/GLContext.h
@@ +1004,5 @@
>  
> +    void ReadyOffscreenTexture() {
> +        BlitDirtyFBOs();
> +        fFlush();
> +    }

I don't understand, didn't we say that glFlush doesn't really do anything useful for us?

Attachment #575375 - Flags: review?(bjacob) → review-

Kelsey Gilbert [:jgilbert]

Assignee

Comment 7

•

13 years ago

Bas had a good point that if glFlush is sufficient to get the context's commands to the hardware's serial buffer, we can guarantee that even commands across different contexts are executed in the order we want, and that the contents of the backing shared texture will be complete and valid for the layers context.

Unfortunately, upon re-reading the spec, it explicitly does not guarantee this. The main relevant section is D.3.3, specifically Rule 2:

While a container object C is bound, any changes made to the contents of
C’s attachments in the current context are guaranteed to be seen. To guarantee
seeing changes made in another context to objects attached to C, such changes
must be completed in that other context (see section D.3.1) prior to C being bound.
Changes made in another context but not determined to have completed as described
in section D.3.1, or after C is bound in the current context, are not guaranteed
to be seen.

Section D.3.1 reads:
The contents of an object T are considered to have been changed once a command
such as described in section D.3 has completed. Completion of a command 1 may
be determined either by calling Finish, or by calling FenceSync and executing a
WaitSync command on the associated sync object. The second method does not
require a round trip to the GL server and may be more efficient, particularly when
changes to T in one context must be known to have completed before executing
commands dependent on those changes in another context.

This wording is from OpenGL 4.2 spec, but OpenGL ES 2.0 spec has similar wording, though omitting fence/sync objects.

As such, our earlier (but not earliest) understanding was correct: We cannot rely on glFlush being sufficient for all implementations, though some implementations may guarantee that it can. Specifically, on ANGLE, glFlush uses a similar function in D3D, which Bas has told me *does* guarantee in-order executions of commands, though we should probably double-check. A number of texts relating to Apple's OGL implementations leads me to believe this is true for them as well.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 8

•

13 years ago

Basically, we can use glFlush if the platform's implementation explicitly supports it, but otherwise, as per spec, we must use either glFinish or ARB_sync.

A relevant link:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=279182

Bas Schouten (:bas.schouten)

Comment 9

•

13 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #7)
> As such, our earlier (but not earliest) understanding was correct: We cannot
> rely on glFlush being sufficient for all implementations, though some
> implementations may guarantee that it can. Specifically, on ANGLE, glFlush
> uses a similar function in D3D, which Bas has told me *does* guarantee
> in-order executions of commands, though we should probably double-check. A
> number of texts relating to Apple's OGL implementations leads me to believe
> this is true for them as well.

Just FYI, this isn't completely true, D3D9 does not have a real flush, the kind of flush it does have is more like a 'glFinish' (using a GPU Query and waiting for it to have executed). D3D10 -does- have a real Flush, which guarantees in-order execution with different D3D10 devices, but -not- across D3D10 and D3D9 devices.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 10

•

13 years ago

I made a wiki page for this glFlush/glFinish stuff: https://wiki.mozilla.org/Platform/GFX/CrossContextSyncing

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

13 years ago

Depends on: 705024

Benoit Girard (:BenWa)

Updated

•

13 years ago

Blocks: omtc

Benoit Jacob [:bjacob] (mostly away)

Updated

•

13 years ago

Blocks: 716859

Benoit Girard (:BenWa)

Comment 11

•

13 years ago

Here's a profiling showing just how bad it is hurting us on:
https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/demos/google/nvidia-vertex-buffer-object/index.html

varium.fantasytalesonline.com/cleopatra/?report=AMIfv95h5nCbkSXxz9W_T9bUQFhogzwtYfpHL8PK8djmoDyaT7Z-ITkNNmA5ME7eQgIRb5IJsG2A3JVlDZCtwKfurHjN6bb7C0raCVRXClfoEylTFOmKv-JibmzVvdtb8MybnMUWdxguqNSzBYKVQobJYobUQAJ_SQ

We're spending 31% of our time waiting on glFinish.

Benoit Girard (:BenWa)

Updated

•

12 years ago

Depends on: 721115

Brad Lassey [:blassey] (use needinfo?)

Updated

•

12 years ago

Keywords: fennecnative-betablocker

Joe Drew (not getting mail)

Comment 12

•

12 years ago

This doesn't need to block beta for Fennec unless we are specifically hurt worse there than on other OSes. We will get this in good time with ARB_sync etc.

Keywords: fennecnative-betablocker

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

12 years ago

Depends on: 739421

Kelsey Gilbert [:jgilbert]

Assignee

Updated

•

12 years ago

Attachment #575375 - Attachment is obsolete: true

Kelsey Gilbert [:jgilbert]

Assignee

Comment 13

•

12 years ago

Attached patch Proof of concept — Details — Splinter Review

Here's a proof-of-concept for OGL layers which can be switch between Finish, ClientWaitSync, and WaitSync for resolving.

Note that I don't think we can get away with using (Server)WaitSync since it doesn't guarantee anything, at least in its current form.

Attachment #570872 - Attachment is obsolete: true

Attachment #616359 - Flags: feedback?(cbrocious)

Attachment #616359 - Flags: feedback?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Comment 14

•

12 years ago

Comment on attachment 616359 [details] [diff] [review]
Proof of concept

Review of attachment 616359 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.cpp
@@ +461,5 @@
>          fGetIntegerv(LOCAL_GL_MAX_RENDERBUFFER_SIZE, &mMaxRenderbufferSize);
>  
> +        if (IsExtensionSupported(ARB_sync)) {
> +            mUseClientWaitSync = Preferences::GetBool("gl.use-client-wait", true);
> +            mUseServerWaitSync = Preferences::GetBool("gl.use-server-wait", false);

With OMTC, we're now creating GL contexts off the main thread, so we can't access prefs from here anymore. To work around this problem, I've been extending gfxPlatform. See bug 686735

::: gfx/gl/GLContext.h
@@ +1104,5 @@
> +    bool mUseServerWaitSync;
> +
> +    void FenceCleanSync() {
> +        if (!mUseClientWaitSync && !mUseServerWaitSync) {
> +            printf_stderr("No sync needed, lol\n");

Remove printfs. This also seems like it would be too verbose even for debug builds. Feel free to implement a gfx.verbose mode where you could go crazy with printfs.

@@ +1111,5 @@
> +
> +        if (mCleanSync)
> +            fDeleteSync(mCleanSync);
> +
> +        mCleanSync = fFenceSync(LOCAL_GL_SYNC_GPU_COMMANDS_COMPLETE, 0);

I don't feel competent enough right now to review code that uses FenceSync (the previous patch was just _exposing_ it). If you can't find someone else easily, I'll read the specs. let me know.

Attachment #616359 - Flags: feedback?(bjacob) → feedback-

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

No longer blocks: omtc

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

Blocks: omtc

Cody Brocious [:Daeken]

Comment 15

•

12 years ago

Comment on attachment 616359 [details] [diff] [review]
Proof of concept

Agree with what bjacob said. In addition, this seems to have no fallback in the case that syncs aren't in use in WaitCleanSync.  Also, do we have information on how often server syncs are actually server syncs rather than just happening on the client?

Attachment #616359 - Flags: feedback?(cbrocious) → feedback-

Joe Drew (not getting mail)

Updated

•

12 years ago

blocking-basecamp: --- → ?

Joe Drew (not getting mail)

Comment 16

•

12 years ago

We're OK with what we're currently doing for B2G.

blocking-basecamp: ? → -

Kelsey Gilbert [:jgilbert]

Assignee

Comment 17

•

12 years ago

We fixed this in various places for different platforms. It's hard to dupe this against anything in particular, but this is effectively done.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.