Closed Bug 618898 (dxgl) Opened 14 years ago Closed 9 years ago

use NVX d3d-gl interop to get faster WebGL rendering

Categories

(Core :: Graphics: CanvasWebGL, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45

People

(Reporter: vlad, Assigned: BenWa)

References

(Blocks 1 open bug, )

Details

Attachments

(4 files, 8 obsolete files)

Attached patch use nvx dx-gl interop for webgl (obsolete) — Splinter Review
Recent nvidia drivers support OpenGL-D3D object interop.  This can significantly speed up WebGL, especially at large canvas sizes, by removing the ReadPixels & subsequent texture upload.

This could work fine on both d3d9 and d3d10, but the initial patch just makes it work with d3d10 layers.  Additional work for d3d9 layers is needed, but is relatively simple (just needs a texture created from our global device).

The way this patch works is to create a D3D9 device solely so that we can create a D3D9 Texture to interop with, because the NVX interop currently supports d3d9 only.  We create that texture with a share handle, then we open it from d3d10 so that we can use it from there.  It's a bit roundabout, but seems to work quite well (and I've confirmed that the dobule-interop of d3d10<->d3d9<->gl should work fine with NV).

The patch is built on top of bug 618892, but doesn't really depend on it.

Asking bas for some feedback on the d3d9 device manager changes.
Attachment #497306 - Flags: feedback?(bas.schouten)
Attachment #497306 - Attachment is patch: true
Attachment #497306 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 497306 [details] [diff] [review]
use nvx dx-gl interop for webgl

I think in theory this looks fine.

I wonder how this performs performance-wise compared to ANGLE on D3D9 though?

It seems D3D9 gives us some added security benefits in the sense that the HLSL compiler seems a lot more robust than some of the GLSL compiler that lives in the graphics UMD. Additionally the HLSL compiler can in the future be run inside the content process away from the graphics UMD which presumably will always require privileges.

Additionally the bytecode coming from the HLSL compiler goes through another extensive validation pass inside D3D9 (I played with this last week and it seems practically impossible to do anything evil to it even if you're injecting evil bytecode sequences directly).

It seems like until NVidia has done security testing on their GLSL compiler (which they told us they hadn't I believe last time they spoke to us) this provides us with some nice added security barriers after the ANGLE validator.

Having said that if using the GL NVidia driver directly is faster we should definitely do that.

I also wonder how this all works with the Optimus architecture.
Attachment #497306 - Flags: feedback?(bas.schouten) → feedback+
(In reply to comment #1)
> Comment on attachment 497306 [details] [diff] [review]
> use nvx dx-gl interop for webgl
> 
> I think in theory this looks fine.
> 
> I wonder how this performs performance-wise compared to ANGLE on D3D9 though?

Tons faster.  There are some issues in ANGLE in certain situations -- especially since it often has to convert GL vertex arrays to D3D streams/structs, which often involves lots of copying.  Once that's fixed (they're working on it), we can see where things are at -- as you say, there are advantages to always going through ANGLE.  But the perf difference is staggering in some cases (70+fps -> 17fps on 1920x1200 aquarium benchmark from nvx interop -> angle/pbuffer interup).

> It seems D3D9 gives us some added security benefits in the sense that the HLSL
> compiler seems a lot more robust than some of the GLSL compiler that lives in
> the graphics UMD. Additionally the HLSL compiler can in the future be run
> inside the content process away from the graphics UMD which presumably will
> always require privileges.

We can (will?) be able to do that with GL actually, since there is the possibility of doing separate compile-to-bytecode and load-bytecode steps with GL.
Is this intended for Firefox 4.0?
Since the patch already has r+ and the fact that Fx5 is turning out to be performance tuneup release, and light on new features. Wouldn't this patch be good for Fx5? The window for Fx5 will close soon so, better earlier than later.
Thanks for the heads up, it seems that this was at risk of getting forgotten. CC'ing some people to limit the risk that that happens.
The patch doesn't apply cleanly anymore. Only one line fails though, easy to fix.

Also the patch doesn't work, because there's a typo in the code.
This line:
{ (PRFuncPtr*) &fDXUnlockObjectsNVX, { "wglDXUnlockObjectNVX", NULL } },

should be:
{ (PRFuncPtr*) &fDXUnlockObjectsNVX, { "wglDXUnlockObjectsNVX", NULL } },

Otherwise the interop doesn't work.

Also the extension isn't experimental any more, so you can probably drop the NVX for NV
In the latest Nvidia 275.xx beta drivers(WHQL and final release soon), there's a newer Nvidia GL-DX interop extension, WGL_NV_DX_interop2. Perhaps Mozilla could use the newer extension in a future Firefox release.

http://www.nvidia.com/object/winxp-275.27-beta-driver.html

http://www.nvidia.com/object/win7-winvista-64bit-275.27-beta-driver.html

http://www.nvidia.com/object/notebook-win7-winvista-64bit-275.27-beta-driver.html
Attachment #497306 - Flags: review?(jgilbert)
WGL_NS_DX_interop2 just expands the initial extension to handle D3D10/11.
It is available here: http://www.opengl.org/registry/specs/NV/DX_interop2.txt
Assignee: vladimir → jgilbert
Status: NEW → ASSIGNED
This is built on bug 716859.

Performance on WebGL pasta is ~31-32 fps with ANGLE, ~30-32 fps with native GL and the interop.
Attachment #497306 - Attachment is obsolete: true
Attachment #497306 - Flags: review?(jgilbert)
Attachment #685397 - Flags: feedback?(bas)
Asking for feedback from Bas about SharedSurface_D3D10Interop::Create, and making sure it's doing reasonable things in d3d land.
Hm, if it's only hitting 30fps, you may want another benchmark... at least something that was hitting 60fps on either case to make sure it still hits 60fps.  I just realized actually, I want to add a target fps to my rAF update patch (though maybe the pref is enough?) so that we can do a target fps of 200fps or 1000fps so we can get the max rate and compare things that way.
I don't think we should ever benchmark something that's flirting with the 60fps cap when the two are so close together. If we remove the cap (or shift it to something sufficiently high), then we can test whatever we want.

It's true though, that to hit only 30fps on my machine means it'll be so GPU bound as to hide anything that's not really quite bad. (like readback)

If we can get a pref for max-fps, that'd be ideal. It's a prevalent configuration dimension in gaming.
(In reply to Jeff Gilbert [:jgilbert] from comment #10)
> Created attachment 685397 [details] [diff] [review]
> Add support for d3d10 path for comparison testing.
> 
> This is built on bug 716859.
> 
> Performance on WebGL pasta is ~31-32 fps with ANGLE, ~30-32 fps with native
> GL and the interop.

Are you saying it's -not- faster with Native GL?
(In reply to Bas Schouten (:bas.schouten) from comment #14)
> (In reply to Jeff Gilbert [:jgilbert] from comment #10)
> > Created attachment 685397 [details] [diff] [review]
> > Add support for d3d10 path for comparison testing.
> > 
> > This is built on bug 716859.
> > 
> > Performance on WebGL pasta is ~31-32 fps with ANGLE, ~30-32 fps with native
> > GL and the interop.
> 
> Are you saying it's -not- faster with Native GL?

Correct, though I'm running a old 275.33 nvidia driver. For this demo, at least.
It's probably also notable that this is a Quadro, not a GeForce.
Depends on: 716859
This will really only be a big win on XP, too -- I mean with ANGLE on your d3d10 machine, you're not doing readback, right?  Or did you force readback to happen?

Also, if you set layout.frame_rate to something like 200 and restart, rAF will try to hit 200 fps.  But some of the frameworks do internal rate limiting as well, so you still might not see faster than 60fps.  three.js I think limits to 60fps, though you can probably hack that out in the source.. not sure about others.  Just depends on finding some content with a fps display that's frame rate limited.
On WebGL Aquarium, layout.frame_rate=200 yields about 103 fps with ANGLE and about 110fps with GL. My lock code here seems messed up, though, and results in weird rendering issues when the normal stream of webgl rendering is interrupted. Changing tabs seems to lock at least one of the buffers such that it's never updated again, for example.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #17)
> This will really only be a big win on XP, too -- I mean with ANGLE on your
> d3d10 machine, you're not doing readback, right?  Or did you force readback
> to happen?
Yes, doing the d3d9 version of this will win us full acceleration on XP.
With ANGLE, we're using ANGLE's d3d share handle extension, not readback.
I can force readback to happen, but that tanks framerate, as we know.
(In reply to Jeff Gilbert [:jgilbert] from comment #19)
> Yes, doing the d3d9 version of this will win us full acceleration on XP.
> With ANGLE, we're using ANGLE's d3d share handle extension, not readback.
> I can force readback to happen, but that tanks framerate, as we know.

Right; the main driver for this is really XP, though being able to optionally skip ANGLE is nice (though it might lead to stability issues given the experimental extension, we'll see!).  So I suppose seeing the same perf between ANGLE and OpenGL on d3d10 systems is a good sign -- means that GL/NVX is as efficient (or slightly more) than going through ANGLE.
(In reply to NVD from comment #21)
> Nvidia has released GeForce 310.70 WHQL drivers. It has significant amount
> of new GL_OES extensions.
> 
> http://www.nvidia.com/object/winxp-310.70-whql-driver.html
> 
> http://www.nvidia.com/object/win8-win7-winvista-64bit-310.70-whql-driver.html
> 
> http://www.nvidia.com/object/notebook-win8-win7-winvista-64bit-310.70-whql-
> driver.html

I don't see the list of new extensions anywhere, but in general, OES exts on desktop will not be useful to us until we are capable of using native GLES2 on desktop. I filed bug 822518 for this.
Jeff, here's the list.

http://www.geeks3d.com/20121023/nvidia-r310-33-beta-graphics-drivers-with-opengl-4-3-support-for-windows/

Except for the extensions below, they were removed from 310.70 WHQL.

GL_OES_compressed_ETC2_RGB8_texture
GL_OES_compressed_ETC2_sRGB8_texture
GL_OES_compressed_ETC2_punchthroughA_RGBA8_texture
GL_OES_compressed_ETC2_punchthroughA_sRGB8_alpha_texture
GL_OES_compressed_ETC2_RGBA8_texture
GL_OES_compressed_ETC2_sRGB8_alpha8_texture
GL_OES_compressed_EAC_R11_unsigned_texture
GL_OES_compressed_EAC_R11_signed_texture
GL_OES_compressed_EAC_RG11_unsigned_texture
GL_OES_compressed_EAC_RG11_signed_texture
Those are just adding the GLES compressed texture formats (including ETC2, which is good) to the desktop.  This is all good for WebGL.  The other OES extensions don't make sense to expose via desktop GL; I'm not surprised that they were removed, as having them visible seems like it was a bug.
Blocks: 756930
Attachment #685397 - Flags: feedback?(bas)
Here's my attempt to port the D3D10 version to OMTC D3D11 on Windows to avoid the slow read back when rendering to OGL via WebGL and compositor is D3D11.
Attachment #8581468 - Flags: review?(jgilbert)
Attachment #8581468 - Attachment is obsolete: true
Attachment #8581468 - Flags: review?(jgilbert)
Attachment #8581472 - Flags: review?(jgilbert)
Sorry for letting this sit. I'm looking to get fast E10S compositing running, and this would be more code to immediately port over. I recommend we wait on this a little longer to see if we can't just rebase it directly on the new stuff.

Does this block anything for you?
Flags: needinfo?(dglastonbury)
Nope. I'm just going through my branches and recording everything that I've changed in patches so we can track them.

Do you want to remove the review request?
Flags: needinfo?(dglastonbury) → needinfo?(jgilbert)
Yeah, let's mothball this temporarily.
Flags: needinfo?(jgilbert)
Comment on attachment 8581472 [details] [diff] [review]
Use DX_NV_Interop2 for faster sharing.

Mothballing for now.
Attachment #8581472 - Flags: review?(jgilbert)
Attached patch Part 1: Add WGL_NV_DX_interop (obsolete) — Splinter Review
Let's land the support for WGL_NV_DX_interop first, while we work on the rest.
Attachment #8671557 - Flags: review?(jgilbert)
Comment on attachment 8671557 [details] [diff] [review]
Part 1: Add WGL_NV_DX_interop

Review of attachment 8671557 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLConsts.h
@@ +5951,5 @@
>  
> +// WGL_NV_DX_interop
> +#define LOCAL_WGL_ACCESS_READ_ONLY                           0x0000
> +#define LOCAL_WGL_ACCESS_READ_WRITE                          0x0001
> +#define LOCAL_WGL_ACCESS_WRITE_DISCARD                       0x0002

Did you get these by regenerating this file? This is a generated file. Add non-generated entries to GLDefs.h.

::: gfx/gl/GLContextProviderWGL.cpp
@@ +144,5 @@
>          NS_WARNING("wglMakeCurrent failed");
>          return false;
>      }
>  
> +    GLLibraryLoader::PlatformLookupFunction lookupFunc =

A `const` would be nice.
Attachment #8671557 - Flags: review?(jgilbert) → review+
Attached patch Part 1: Add WGL_NV_DX_interop (obsolete) — Splinter Review
Attachment #8671557 - Attachment is obsolete: true
Attachment #8671590 - Flags: review+
Assignee: jgilbert → nobody
Status: ASSIGNED → NEW
Attached patch patch (obsolete) — Splinter Review
Assignee: nobody → bgirard
Attachment #8671590 - Attachment is obsolete: true
Status: NEW → ASSIGNED
There's some work that happened on this for GDC that on this branch:
https://github.com/jdashg/gecko-dev/commits/demo-hacks

Pulling in that work into patches we can land behind a preference for now so that it stop rotting.
Alias: dxgl
I've made some progress on this but I'm getting FRAMEBUFFER_UNSUPPORTED in ReadBuffer::Create under GLScreenBuffer::CreateRead. It's calling AttachBuffersToFB() with a colorRB and not septh/stencil to target 0.
This will hit the error I have above but I'd like to get this in the tree where it will finally stop rotting. Next week I'll check if we get errors when creating mProdRB that might be the cause.
Attachment #8678327 - Flags: review?(jgilbert)
Keywords: leave-open
Dan might help me take a look. I can't figure out why this wont work since I'm passing in and calling the APIs the same way as a working build on the same hardware. There must be a difference but I can't put my finger on it.

Part 2 posted above should do. The relevant failure is at gl->IsFramebufferComplete(fb) in Readbuffer::Create()
Flags: needinfo?(dglastonbury)
Benoit,

This works for me. I copied most of SharedSurfaceANGLE and made a small change to GLScreenBuffer. PTAL.
Attachment #8683935 - Flags: feedback?(bgirard)
Assignee: bgirard → dglastonbury
Back to you, Benoit.
Assignee: dglastonbury → bgirard
Flags: needinfo?(dglastonbury)
Comment on attachment 8683935 [details] [diff] [review]
Fixed up BenWa's port of patch to work on my local machine. f=bgirard

Cool, I confirmed that it works. I guess all that's left for now it to land it behind a preference.
Attachment #8683935 - Flags: feedback?(bgirard) → feedback+
Attached patch patch (obsolete) — Splinter Review
Attachment #8677028 - Attachment is obsolete: true
Attachment #8678327 - Attachment is obsolete: true
Attachment #8683935 - Attachment is obsolete: true
Attachment #8678327 - Flags: review?(jgilbert)
Attachment #8685011 - Flags: review?(jgilbert)
I'm done very limited testing but the performance seems slightly better than D3D11 angle. I need more data to confirm.
Comment on attachment 8685011 [details] [diff] [review]
patch

Review of attachment 8685011 [details] [diff] [review]:
-----------------------------------------------------------------

We're close!

::: gfx/gl/GLConsts.h
@@ +5670,5 @@
>  
> +// WGL_NV_DX_interop
> +#define LOCAL_WGL_ACCESS_READ_ONLY                           0x0000
> +#define LOCAL_WGL_ACCESS_READ_WRITE                          0x0001
> +#define LOCAL_WGL_ACCESS_WRITE_DISCARD                       0x0002

This is a generated file. Unless you got this from updating and regenerating the file, these should go in GLDefs.h. (r-)

::: gfx/gl/GLScreenBuffer.cpp
@@ +105,5 @@
>                      factory = SurfaceFactory_ANGLEShareHandle::Create(gl, caps, forwarder, flags);
>                  }
> +
> +                if (!factory && gfxPrefs::WebGLDXGLEnabled()) {
> +                  factory = SurfaceFactory_D3D11Interop::Create(gl, caps, forwarder, flags);

Bad indent.

::: gfx/gl/SharedSurfaceD3D11Interop.cpp
@@ +149,5 @@
> +    ~DXGLDevice() {
> +        if (!mWGL->fDXCloseDevice(mDXGLDeviceHandle)) {
> +            uint32_t error = GetLastError();
> +            printf_stderr("wglDXCloseDevice(0x%x) failed: GetLastError(): 0x%x\n",
> +                          mDXGLDeviceHandle, error);

MOZ_ASSERT here?

@@ +160,5 @@
> +        if (!ret) {
> +            uint32_t error = GetLastError();
> +            printf_stderr("wglDXRegisterObject(0x%x, 0x%x, %u, 0x%x, 0x%x) failed:"
> +                          " GetLastError(): 0x%x\n", mDXGLDeviceHandle, dxObject, name,
> +                          type, access, error);

MOZ_ASSERT here and elsewhere below, unless we know (and have left a comment about) any good reasons this might fail.

@@ +205,5 @@
> +                                   GLContext* gl,
> +                                   const gfx::IntSize& size,
> +                                   bool hasAlpha)
> +{
> +    ID3D11Device* d3d = dxgl->mD3D;

const auto& d3d is nearly preferable.

@@ +221,5 @@
> +        return nullptr;
> +    }
> +
> +    IDXGIResource* textureDXGI;
> +    hr = textureD3D->QueryInterface(__uuidof(IDXGIResource), (void**)&textureDXGI);

Shouldn't this be:
RefPtr<IDXGIResource> textureDXGI;
hr = textureD3D->QueryInterface(getter_AddRefs(textureDXGI));

@@ +228,5 @@
> +        return nullptr;
> +    }
> +
> +    RefPtr<IDXGIKeyedMutex> keyedMutex;
> +    hr = textureD3D->QueryInterface((IDXGIKeyedMutex**) getter_AddRefs(keyedMutex));

Doesn't getter_Addrefs() return a T**?

@@ +235,5 @@
> +        return nullptr;
> +    }
> +
> +    HANDLE sharedHandle;
> +    textureDXGI->GetSharedHandle(&sharedHandle);

This is infallible?

@@ +243,5 @@
> +    gl->fGenRenderbuffers(1, &renderbufferGL);
> +    HANDLE objectWGL = dxgl->RegisterObject(textureD3D, renderbufferGL,
> +                                            LOCAL_GL_RENDERBUFFER,
> +                                            LOCAL_WGL_ACCESS_READ_WRITE);
> +                                            //LOCAL_WGL_ACCESS_WRITE_DISCARD_NV);

I don't actually know. Ideally we'd be WRITE_DISCARD, unless it marks it truly read-only. If it just makes an undefined-data scratch surface for us to write into, but we can still read from it, we want WRITE_DISCARD.

Does WRITE_DISCARD not work?

@@ +294,5 @@
> +
> +    mGL->fDeleteRenderbuffers(1, &mProdRB);
> +
> +    if (!mDXGL->UnregisterObject(mObjectWGL))
> +    NS_WARNING("Failed to release a DXGL object, possibly leaking it.");

Bad indent.

@@ +315,5 @@
> +
> +void
> +SharedSurface_D3D11Interop::Fence()
> +{
> +    mGL->fFinish();

Surely we don't need Finish() here?

We don't really have a Fence() option with this API. I do think that these Fence() funcs are about to be removed, so we can leave this for now.

@@ +336,5 @@
> +{
> +    MOZ_ASSERT(!mLockedForGL);
> +
> +    if (mKeyedMutex) {
> +        HRESULT hr = mKeyedMutex->AcquireSync(0, 10000);

Pull these magic numbers out into named constants.

@@ +339,5 @@
> +    if (mKeyedMutex) {
> +        HRESULT hr = mKeyedMutex->AcquireSync(0, 10000);
> +        if (hr == WAIT_TIMEOUT) {
> +            // Doubt we should do this? Maybe Wait for ever?
> +            MOZ_CRASH();

Uh, this is fine for now, though please put some text in MOZ_CRASH.

I think it's a bunch of work to make ProducerAcquire properly fallible.

@@ +363,5 @@
> +    // Now we have unlocked for GL, we can release to consumer.
> +    if (mKeyedMutex) {
> +        mKeyedMutex->ReleaseSync(0);
> +    }
> +    Fence();

Fence() calls glFinish. Surely we only need to do this if we don't have a keyed mutex!

@@ +387,5 @@
> +        }
> +    }
> +
> +    if (mConsumerKeyedMutex) {
> +        HRESULT hr = mConsumerKeyedMutex->AcquireSync(0, 10000);

Pull the magic numbers out into constants.

@@ +407,5 @@
> +SharedSurface_D3D11Interop::Fence_ContentThread_Impl()
> +{
> +    if (mFence) {
> +        MOZ_ASSERT(mGL->IsExtensionSupported(GLContext::NV_fence));
> +        mGL->fSetFence(mFence, LOCAL_GL_ALL_COMPLETED_NV);

I don't think we should (or need to) be mixing extensions like this.

::: gfx/gl/SharedSurfaceD3D11Interop.h
@@ +28,5 @@
> +    const HANDLE mSharedHandle;
> +    const RefPtr<ID3D11Texture2D> mTextureD3D;
> +    RefPtr<IDXGIKeyedMutex> mKeyedMutex;
> +    RefPtr<IDXGIKeyedMutex> mConsumerKeyedMutex;
> +    RefPtr<ID3D11Texture2D> mConsumerTexture;

I think we can safely get rid of the Consumer stuff, but I can do this in a follow-up.
Attachment #8685011 - Flags: review?(jgilbert) → review-
As noted in IRC I'm going to keep the glFinish in this landing and fix it in a follow-up. Finally getting a patch for this 5 years later :D.
Attached patch patchSplinter Review
Attachment #8685011 - Attachment is obsolete: true
Attachment #8687341 - Flags: review?(jgilbert)
Attachment #8687341 - Flags: review?(jgilbert) → review+
Keywords: leave-open
Target Milestone: --- → mozilla45
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1234938
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: