Closed Bug 1477756 (webgl-ipc-refactor) Opened 6 years ago Closed 4 years ago

Refactor WebGL in preparation for proxying WebGL operations using GPU process for sandboxing

Categories

(Core :: Graphics: CanvasWebGL, enhancement, P1)

enhancement

Tracking

()

VERIFIED FIXED
mozilla74
Tracking Status
firefox74 --- verified

People

(Reporter: handyman, Assigned: jgilbert)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

(Whiteboard: [fxr-ww][geckoview:fxr:p1])

Attachments

(11 files, 15 obsolete files)

46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
21.90 KB, text/plain
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
This is the bug for the WebGL remoting effort.  As part of the goal of removing graphics driver usage from the content process to allow for more intense sandboxing, WebGL needs to "run" its commands in another process -- that is the compositor GPU process.
A fairly messy stab at an implementation.  This version doesn't work because the display stuff isn't wired properly (we don't yet have a spec for that part of the work so that is more about POC than a final implementation anyway).

In broad strokes, I:

* Determine if WebGL remoting is on using the pref webgl.is_remoting
* Break WebGLContext into WebGLContextBase (common base for WebGLContexts), WebGLContext (for content process when not remoting or, when remoting, a GPU process object) and RemoteWebGLContext (for content-process objects).  This setup preserves WebGLContext.cpp's history.
* When remoting, create a bridge between the processes using CrossProcessCompositorBridge (I needed an actor that bridges content and GPU).  The 'bridge' is a single-producer, single-consumer ProducerConsumerQueue (also implemented here).  Once established, the PCQ handles communication (IPDL is not used at that point).  A dedicated thread in the GPU process continually drains the command queue.
* Initialization/rendering of the WebGL context is proxied from the content-side RemoteWebGLContext to the GPU-side WebGLContext in a way that preserves operations.  The main methods involved are: GetCanvasLayer, SetDimensions, Begin/EndComposition, and DidRefresh.  Other remoted methods support those operations or are basic WebGL commands (like Clear()).  This part is broken -- specifically, the SetDimensions call is a WIP.  The hangup is the 'remoted' shared surface (which is obviously a hack) as the GL Screen object.  This is the POC work mentioned earlier.
For moving stuff into/out of IPDL messages.  Useful for maintaining proper memory behavior for the ProducerConsumerQueues in part 2.
Attachment #8994256 - Attachment is obsolete: true
Attachment #8996759 - Attachment is obsolete: true
First working version.  This properly displays the demo at https://mdn.github.io/webgl-examples/tutorial/sample1/ .  I'm working on a writeup of this stuff now.
Attachment #8998902 - Attachment is obsolete: true
Attached patch WIP: proxy webgl (obsolete) — Splinter Review
Posting another WIP.  This still fails WebGL tests with an XPCOM crash (when not remoting WebGL):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3603ae6070f32b0a403f56b562f7332ced2d814

The Android build is fixed:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=690d69f146551c4b460dbbed99e0df918ee9612e
Attachment #8994258 - Attachment is obsolete: true
Attachment #8999681 - Attachment is obsolete: true
Attached patch WIP: proxy webgl (obsolete) — Splinter Review
Refreshing patch for Alex.  This version builds on all platforms and passes all tests (no webgl-remoting).
Attachment #9004948 - Attachment is obsolete: true
This patch makes UniquePtr<T> parameters available to IPDL for T with a ParamTraits implementation and a default constructor (the usual IPDL requirements).  The semantics are that the object pointed to is consumed (destroyed) when it is passed as a parameter in an IPDL Send... message.  It is also moved _into_ the Recv... function on the receiving process.

The patch overloads Send... messages so that they can accept the UniquePtr by reference or by 'move' (rvalue-reference).  The reference form is non-const because the UniquePtr will still be moved out of the reference by the Send operation.  The object can not be used at that point.
This ProducerConsumerQueue (PCQ) is a lock-free, fixed-size, circular buffer that allows one process to produce data and another process to consume it.  It can also be used intra-process.  We use this later in this patch series as a fast command buffer for delegating WebGL commands out of the content process sandbox.  The speed is valuable as it avoids blocking JS unless it is actually necessary.
This is a single-producer/single-consumer queue, which means that up to one thread may be producing data and up to one thread consuming it at any time.  Any more will lead to concurrency errors.
A ProducerConsumerQueue object is just a data object that holds a Producer/Consumer pair.  The user sends objects with the Producer and receives them with the Consumer.  The Producer/Consumer communicate by a shared Shmem.  Either the Producer or the Consumer can be sent to a remote process with IPDL.  They cannot be copied.
This implementation serializes POD types with memcpy.  A future addition could serialize with a ParamTraits-like mechanism (but ParamTraits itself is very IPDL-specific).
This patch offers an optional type-checker for the data inserted into the PCQ.  It simply inserts an enum that specifies the type of the data that follows it.
The ProducerConsumerQueue also allows requiring that data be contiguous or not.  If a PCQ does not require contiguous data then it packs objects (by splitting them) when they won't fit at the end of the circular queue.  When a PCQ forbids packing objects then the Consumer can pull objects by reference instead of copying them.

Depends on D5174
Turns WebGLContext into a (mostly abstract) base, WebGLContextBase, the main implementation WebGLContext, and the proxy implementation for use in sandboxed processes, ProxyWebGLContext.  ProxyWebGLContext is currently largely unimplemented.

Depends on D5176
This is an inherited version of NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING -- useful for cases where e.g. a base class is declared that uses NS_INLINE_DECL_PURE_VIRTUAL_REFCOUNTING.

Depends on D5177
Separates implemntations because only one (JS) uses the cycle-detecting garbage collector.  We need to avoid this in remote WebGL instances where there is no JS context.  The difference is almost entirely hidden from other code but there are a few methods that only make sense with JS objects, like WebGLContext::GetParameter.

Depends on D5178
Provides a dedicated consumer RPC thread and Easy-to-use helpers for remotely executing a method on a WebGLContext.  All parameters and the return value must be known to the ProducerConsumerQueue (see MAKE_TYPEINFO).  These methods then syncronously or asynchronously serialize/deserialize parameters and establish a protocol for errors.  Currently, the protocol simply crashes when an error is returned.

Depends on D5179
If we created a ProxyWebGL*Context implementation as our content process WebGLContext (see Part 3) then the ProxyWebGL*Context::Create method will begin the process of creating a bridge with the compositor process using the IPDL CompositorBridge.

Depends on D5181
Uses the AsyncCanvasRenderer facility to connect the content-side SharedSurface proxy used for composition with the GLFramebuffer used as the front-buffer in the compositor-side WebGLContext.

Depends on D5182
Attachment #9006319 - Attachment is obsolete: true
Comment on attachment 9007002 [details]
Bug 1477756: Part 3 - Add NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING_INHERITED

Nathan Froyd [:froydnj] has approved the revision.
Attachment #9007002 - Flags: review+
For kicks (and because its nearly impossible to know without a proof), I've been mocking up the behavior of the ProducerConsumerQueue in TLA+.  Specifically, I've been focused on the memory order behavior around the atomic variables since this is where all of the major complexity arises.  This behavior includes validating that local (cache) views of the data structure maintain consistency _and_ that compiler instruction reordering won't break it.  I model the behavior of all possible memory order choices for the 5-line Producer Process algorithm-template and the 5-line Consumer Process algorithm-template the model uses.  The expectation is that the current solution in code (which is fast) and that a solution that uses all memory_order_seq_csts (which is slow) will pass.  The spec is about 95% done -- it does not properly model process-local views of the queue (only the read/write heads) but it does the rest.

The comments should be clarifying.  The Producer algorithm is assumed to fit this template, where PMO* are the memory orders we can set for the test:

0. DoSomethingP();     // takes 1 step and is independent of the other variables 
1. w = write.load(PMO1);    
2. r = read.load(PMO2);     // PMO2 will be std::memory_order_acquire
3. M = Produce(r, w)
   Produce M values in M steps, asserting that they were formerly consumed.
   M is chosen to be a value that guarantees w never writes past r
   i.e.  M < FreeBytes(r, w)
   NOTE: This means the queue can never be "full".
   This is because read/write give us no way to tell the difference between a full queue and an empty one.
4. write.store((w+M) % N, PMO3) unless the queue was full

The consumer template, with CMO* as memory order placeholders, is:

0. DoSomethingC();
1. w = write.load(CMO1);
2. r = read.load(CMO2); 
3. M = Consume(r, w)
   Consume M values in M steps, asserting that they were formerly produced.
   M is chosen to be a value that guarantees r never reads past w
   i.e.  M <= UsedBytes(r, w)
4. read.store((r+M) % N, CMO3) unless the queue was empty
Comment on attachment 9006995 [details]
Bug 1477756: Part 1 - Allow UniquePtr with move semantics in IPDL messages

Jed Davis [:jld] (⏰UTC-6) has approved the revision.
Attachment #9006995 - Flags: review+
Attachment #9007846 - Attachment mime type: application/octet-stream → text/html
Attachment #9007846 - Attachment mime type: text/html → text/plain
Attachment #9007846 - Attachment is obsolete: true
I'm not sure which patch this bug is from, but I came across what appears to be a bug. The canvas appears to not actually be rendering anything, or is transparent or something. This is demonstrated by the following program. Only the background color is visible, so you get a black box (you can change the background color to green to verify!), when in fact you should be getting a purple one. If you comment out the CSS you'll get a blank space.

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8" />
        <style type="text/css">
            canvas {
                border: 2px solid black;
                background-color: black;
            }
        </style>
    </head>
    <body>
        <canvas id="glcanvas" width="640" height="480"></canvas>
    </body>

    <script type="text/javascript">
        document.addEventListener("DOMContentLoaded", function() {
            const c = document.querySelector("#glcanvas");
            const gl = c.getContext("webgl");
            if (!gl) {
                alert("Unable to initialize GL");
                return;
            }

            gl.clearColor(0.7, 0.2, 0.4, 1.0);
            gl.clear(gl.COLOR_BUFFER_BIT);
        });
    </script>
</html>
QA Contact: jmathies
QA Contact: jmathies
See Also: → 1500219
Tests for the ProducerConsumerQueue, which was introduced in Part 2 of this patch series.

Depends on D5183
Attachment #9006995 - Attachment is obsolete: true
These patch updates merge us with trunk.  A couple of serious changes to be aware of:

* I removed the old part 1: uniqueptr ipdl integration.  That's been moved to bug 1500219 -- the patches in that bug need to be imported _first_, before the patches in this bug, until it lands.

* I've added a new part (part 8) with PCQ tests.  They still need work.
I assumed Phabricator would do something unexpected with this -- I just noticed that it ignored my changes to the commit messages that re-numbered the patches.  So we now have patches #2-8 and another new #8.  I'll figure out whats happening here.
Attachment #9007000 - Attachment description: Bug 1477756: Part 2 - Implement ProducerConsumerQueue as a fast command buffer → Bug 1477756: Part 1 - Implement ProducerConsumerQueue as a fast command buffer
Attachment #9007001 - Attachment description: Bug 1477756: Part 3 - Refactor WebGLContext to add proxy implementation → Bug 1477756: Part 2 - Refactor WebGLContext to add proxy implementation
Attachment #9007002 - Attachment description: Bug 1477756: Part 4 - Add NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING_INHERITED → Bug 1477756: Part 3 - Add NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING_INHERITED
Attachment #9007004 - Attachment description: Bug 1477756: Part 5 - Refactor WebGLVertexArray into JS and non-JS implementations → Bug 1477756: Part 4 - Refactor WebGLVertexArray into JS and non-JS implementations
Attachment #9007006 - Attachment description: Bug 1477756: Part 6 - Create framework for a ProxyWebGLContext to execute methods on a remote WebGLContext → Bug 1477756: Part 5 - Create framework for a ProxyWebGLContext to execute methods on a remote WebGLContext
Attachment #9007007 - Attachment description: Bug 1477756: Part 7 - Establish ProxyWebGLContext link with compositor process if webgl.is_remoting → Bug 1477756: Part 6 - Establish ProxyWebGLContext link with compositor process if webgl.is_remoting
Attachment #9007009 - Attachment description: Bug 1477756: Part 8 - Integrate ProxyWebGLContext with the composition cycle → Bug 1477756: Part 7 - Integrate ProxyWebGLContext with the composition cycle
Alias: webgl-ipc
Component: Security: Process Sandboxing → Canvas: WebGL
I think I'm on to a way to disentangle CC from WebGL object lifetimes, but I'll need time tomorrow to provide a good example. (with VertexArray and Buffer)
I still feel like this will be easy to rebase these patches onto. We'll see what you all think tomorrow.
Attachment #9007000 - Attachment is obsolete: true
Attachment #9007001 - Attachment is obsolete: true
Attachment #9007002 - Attachment is obsolete: true
Attachment #9007004 - Attachment is obsolete: true
Attachment #9007006 - Attachment is obsolete: true
Attachment #9007007 - Attachment is obsolete: true
Attachment #9007009 - Attachment is obsolete: true
Attachment #9019499 - Attachment is obsolete: true
Attached patch WIP: Current remoting work (obsolete) — Splinter Review

This is my latest, a bit cleaned up but obviously not well organized as the patch is 750K. The checkin comment has a bit of detail on how it works. This won't build or anything but it should hopefully illustrate any sticking points.

Whiteboard: [fxr-ww]

Adding [geckoview:fxr:p1] whiteboard tag to track this bug for GeckoView in Firefox Reality. Ehsan says Firefox Reality spends "around 22% of each frame waiting for the GL driver on the main thread; asynchronous WebGL would eliminate most of that waiting and increase the FPS by up to 28%."

Whiteboard: [fxr-ww] → [fxr-ww][geckoview:fxr:p1]

(In reply to Chris Peterson [:cpeterson] from comment #30)

Adding [geckoview:fxr:p1] whiteboard tag to track this bug for GeckoView in Firefox Reality. Ehsan says Firefox Reality spends "around 22% of each frame waiting for the GL driver on the main thread; asynchronous WebGL would eliminate most of that waiting and increase the FPS by up to 28%."

I disagree with this assessment.

I expect initial performance of remoted WebGL to be similar-to-worse than present WebGL, at least for the initial landing.
There are no free lunches here.

Why is that? Because of the cost of the command serialization? And are you referring to frame rate overall or to the time spent on the main thread?

Extra copies, ser/deser costs, and various synchronizations. Initially, we're going to have a greater than ideal number of synchronization points, which we'll be burning down in time, and in response to profiling of content.

(P2 because it's not necessarily aiming for this next release, and P1 here would mean "aiming for this current Nightly")

Priority: P1 → P2
See Also: → 1546823

Producer-consumer queues are often implemented with 2 semaphores that track how
much data/room is in the queue. This is not that -- that makes the processes
more synchronized than is necessary and requires us to be able to track how
much room is left, which qould require the semaphore to track the number of
bytes used/remaining, since we permit variable message lengths. And
CrossProcessSemaphores can only increment/decrement one at a time (thanks,
Windows), which would make byte count tracking with semaphore operations O(n)
per transaction instead of O(1).

This implementation avoids all of that -- careful analysis bounds the semaphore
values to a maximum of 2.

  • Context loss using RAII
  • Move Program reflection Client-side

Unity 2018 benchmark (https://files.unity3d.com/marcot/benchmarks2018.2.5f1/) runs roughly equivalent after fixing that I never marked LinkResult.pending = false. :)

(This is a combination of 31 commits)

  • Fix Linux compilation.

  • Fix mac compilation.

  • CI compile fixes.

  • printf's size_t is %zu. %tu would be unsigned ptrdiff_t.

  • No non-ref Maybe args.

  • MOZ_CRASH for noreturn

  • Handle implied texture sizes, rewrite comment stripping.

  • Replace e.g. WebGLProgramInner with simpler webgl::ProgramKeepAlive.

  • Bounce ValidateProgram call off driver.

  • Uniform name length limit, cubemap fb-attach, non-array uniforms, undersized texImage views.

  • alignas for uint8_t[sizeof(float)*N] pun buffers.

  • CC fixes?

  • Fill attrib0Active.

  • Repair max-warnings limit.

  • This is basically required in order for CI's logging to not explode.

  • Don't cache WebGLMemoryTracker.

  • Deleted prog/shader error, no texSubImage(null), client-side fingerprint resist for exts.

  • Fix GetUniformIndices and MakeRangeFromView.

  • CC Traverse base class from within derived class to fix leaking the world. :(

  • PauseTransformFeedback

  • TexImage video fastpath

  • GetFragLocation for arrays

  • Forbid BindBufferRange during TF

  • Mark tests and fix RBAB query and test.

  • Change(!) query deletion behavior to match spec.

  • Mark conformance2/query/query.html failing for now.

  • Implicitly EndQuery on DeleteQuery while spec is in flux.

  • Fix error code for test.

  • RAII LruPosition for WebGL context limit.

  • Include std::list.

  • Mark CompileResult and LinkResult.pending as false when retrieved.

  • Hold strong-ref to NotLostData during Run<> to prevent LoseContext=>UAF.

  • Don't assume GetUniformLocation(foo+'[0]') means foo is an array.

  • Don't assume !mCanvasElement means !!mOffscreenCanvas.

  • Handle composition while context-lost.

  • All non-value-init members must be const or have inline init.

  • Mark passing tests on Linux.

  • Revert some partial webgl+oop+vr code.
  • More missing FuncScope.
  • Fix compile errors.
  • Refactor some ifdef'd code to branch and compile on all platforms.
  • -Wno-error=unused-result for GCC to allow for (void)MustUse().

Moved some things, removed some things, blah blah blah.

Blocks: 1605822
No longer blocks: 1605822
Assignee: davidp99 → jgilbert
Priority: P2 → P1
Summary: Proxy WebGL operations using GPU process for sandboxing → Refactor WebGL in preparation for proxying WebGL operations using GPU process for sandboxing
Attachment #9020139 - Attachment is obsolete: true
Attachment #9007001 - Attachment is obsolete: false
Attachment #9007004 - Attachment is obsolete: false
Attachment #9007006 - Attachment is obsolete: false
Attachment #9007007 - Attachment is obsolete: false
Attachment #9007009 - Attachment is obsolete: false
Attachment #9026811 - Attachment is obsolete: true
Attachment #9040783 - Attachment is obsolete: true
Pushed by jgilbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/71c122ac0ca7
Initial out-of-process WebGL implementation. r=mccr8,handyman
https://hg.mozilla.org/integration/autoland/rev/ccfa767dba64
Client-side bindings mirror for precise CC, and merge similar codepaths. r=handyman
https://hg.mozilla.org/integration/autoland/rev/7e2a2b1b416f
Fix all webgl regression tests according to CI. r=handyman
https://hg.mozilla.org/integration/autoland/rev/e1b0906509ef
Fix non-webgl CI tests. r=handyman
Blocks: 1607940
Blocks: 1304507
Regressions: 1608309
Regressions: 1608311
Alias: webgl-ipc → webgl-ipc-refactor
Regressions: 1608365
Regressions: 1608500
Regressions: 1609006
Regressions: 1608330
Regressions: 1609908
Depends on: 1608235
No longer depends on: 1608235
See Also: → 1606830

This feels like something we should get QA smoketesting ASAP since my understanding is that we don't have a lot of options with respect to backing out or disabling if major bugs are found before 74 goes to release.

Flags: qe-verify+

Hi, Are there any steps or test pages we could use in order to verify this enhancement ?

Flags: needinfo?(jgilbert)

Nothing specifically. https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html is the online version of the conformance tests, for which this shouldn't crash, and shouldn't cause toooo many failures. We run this on CI though.
Google Maps might be worth testing, but we expect a ton of bug reports if anything happens there. We expect to have pretty good verification by staying on top of bug and crash triage here.

Flags: needinfo?(jgilbert)
Regressions: 1615814
Regressions: 1617512

I've tested this enhancement on Windows 10 and Mac 10.15.3 using the webGL conformance tests in Beta 74.0b7 and I got the following results:

Macintosh -
Test Summary: FAIL (2753 tests):
290846 subtests ran in 3571.65 seconds
PASSED: 2538 tests, 280536 subtests
NOT PASSED: 215 tests, 10310 subtests
FAILED: 215 tests, 10310 subtests
TIMED OUT: 0 tests, 0 subtests
SKIPPED: 0 tests, 0 subtests

Windows 10 -
Test Summary: FAIL (2753 tests):
289144 subtests ran in 7597.37 seconds
PASSED: 2670 tests, 284438 subtests
NOT PASSED: 83 tests, 4706 subtests
FAILED: 58 tests, 4681 subtests
TIMED OUT: 25 tests, 25 subtests
SKIPPED: 0 tests, 0 subtests

I've also tested Google Maps and I couldnt find any graphical or performance issues on either Mac or Windows.
Please note that on Ubuntu 16.04 the "Run Tests" button was greyed out, but I did test Google Maps and I couldn't find any noticeable issues.

I'm not sure what too many failures means but please let me know if we can mark this issue verified as Fixed based on these results.

Flags: needinfo?(jgilbert)
Regressions: 1618325

That's great, thanks.

Status: RESOLVED → VERIFIED
Flags: needinfo?(jgilbert)

Based on Comment 49 I will update the tracking flag for this issue.

Flags: qe-verify+
No longer regressions: 1618325
Regressions: 1624275
Regressions: 1628193
Regressions: 1636524
Regressions: 1654048
Regressions: 1675874
Regressions: 1657189
Regressions: 1794893
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: