Refactor WebGL in preparation for proxying WebGL operations using GPU process for sandboxing
Categories
(Core :: Graphics: CanvasWebGL, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox74 | --- | verified |
People
(Reporter: handyman, Assigned: jgilbert)
References
(Blocks 1 open bug, Regressed 1 open bug)
Details
(Whiteboard: [fxr-ww][geckoview:fxr:p1])
Attachments
(11 files, 15 obsolete files)
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
46 bytes,
text/x-phabricator-request
|
Details | Review | |
21.90 KB,
text/plain
|
Details | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
This is the bug for the WebGL remoting effort. As part of the goal of removing graphics driver usage from the content process to allow for more intense sandboxing, WebGL needs to "run" its commands in another process -- that is the compositor GPU process.
Reporter | ||
Comment 1•6 years ago
|
||
A fairly messy stab at an implementation. This version doesn't work because the display stuff isn't wired properly (we don't yet have a spec for that part of the work so that is more about POC than a final implementation anyway). In broad strokes, I: * Determine if WebGL remoting is on using the pref webgl.is_remoting * Break WebGLContext into WebGLContextBase (common base for WebGLContexts), WebGLContext (for content process when not remoting or, when remoting, a GPU process object) and RemoteWebGLContext (for content-process objects). This setup preserves WebGLContext.cpp's history. * When remoting, create a bridge between the processes using CrossProcessCompositorBridge (I needed an actor that bridges content and GPU). The 'bridge' is a single-producer, single-consumer ProducerConsumerQueue (also implemented here). Once established, the PCQ handles communication (IPDL is not used at that point). A dedicated thread in the GPU process continually drains the command queue. * Initialization/rendering of the WebGL context is proxied from the content-side RemoteWebGLContext to the GPU-side WebGLContext in a way that preserves operations. The main methods involved are: GetCanvasLayer, SetDimensions, Begin/EndComposition, and DidRefresh. Other remoted methods support those operations or are basic WebGL commands (like Clear()). This part is broken -- specifically, the SetDimensions call is a WIP. The hangup is the 'remoted' shared surface (which is obviously a hack) as the GL Screen object. This is the POC work mentioned earlier.
Reporter | ||
Comment 2•6 years ago
|
||
For moving stuff into/out of IPDL messages. Useful for maintaining proper memory behavior for the ProducerConsumerQueues in part 2.
Reporter | ||
Comment 3•6 years ago
|
||
Reporter | ||
Comment 4•6 years ago
|
||
Reporter | ||
Comment 5•6 years ago
|
||
First working version. This properly displays the demo at https://mdn.github.io/webgl-examples/tutorial/sample1/ . I'm working on a writeup of this stuff now.
Reporter | ||
Comment 6•6 years ago
|
||
Posting another WIP. This still fails WebGL tests with an XPCOM crash (when not remoting WebGL): https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3603ae6070f32b0a403f56b562f7332ced2d814 The Android build is fixed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=690d69f146551c4b460dbbed99e0df918ee9612e
Reporter | ||
Comment 7•6 years ago
|
||
Refreshing patch for Alex. This version builds on all platforms and passes all tests (no webgl-remoting).
Reporter | ||
Comment 8•6 years ago
|
||
This patch makes UniquePtr<T> parameters available to IPDL for T with a ParamTraits implementation and a default constructor (the usual IPDL requirements). The semantics are that the object pointed to is consumed (destroyed) when it is passed as a parameter in an IPDL Send... message. It is also moved _into_ the Recv... function on the receiving process. The patch overloads Send... messages so that they can accept the UniquePtr by reference or by 'move' (rvalue-reference). The reference form is non-const because the UniquePtr will still be moved out of the reference by the Send operation. The object can not be used at that point.
Reporter | ||
Comment 9•6 years ago
|
||
This ProducerConsumerQueue (PCQ) is a lock-free, fixed-size, circular buffer that allows one process to produce data and another process to consume it. It can also be used intra-process. We use this later in this patch series as a fast command buffer for delegating WebGL commands out of the content process sandbox. The speed is valuable as it avoids blocking JS unless it is actually necessary. This is a single-producer/single-consumer queue, which means that up to one thread may be producing data and up to one thread consuming it at any time. Any more will lead to concurrency errors. A ProducerConsumerQueue object is just a data object that holds a Producer/Consumer pair. The user sends objects with the Producer and receives them with the Consumer. The Producer/Consumer communicate by a shared Shmem. Either the Producer or the Consumer can be sent to a remote process with IPDL. They cannot be copied. This implementation serializes POD types with memcpy. A future addition could serialize with a ParamTraits-like mechanism (but ParamTraits itself is very IPDL-specific). This patch offers an optional type-checker for the data inserted into the PCQ. It simply inserts an enum that specifies the type of the data that follows it. The ProducerConsumerQueue also allows requiring that data be contiguous or not. If a PCQ does not require contiguous data then it packs objects (by splitting them) when they won't fit at the end of the circular queue. When a PCQ forbids packing objects then the Consumer can pull objects by reference instead of copying them. Depends on D5174
Reporter | ||
Comment 10•6 years ago
|
||
Turns WebGLContext into a (mostly abstract) base, WebGLContextBase, the main implementation WebGLContext, and the proxy implementation for use in sandboxed processes, ProxyWebGLContext. ProxyWebGLContext is currently largely unimplemented. Depends on D5176
Reporter | ||
Comment 11•6 years ago
|
||
This is an inherited version of NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING -- useful for cases where e.g. a base class is declared that uses NS_INLINE_DECL_PURE_VIRTUAL_REFCOUNTING. Depends on D5177
Reporter | ||
Comment 12•6 years ago
|
||
Separates implemntations because only one (JS) uses the cycle-detecting garbage collector. We need to avoid this in remote WebGL instances where there is no JS context. The difference is almost entirely hidden from other code but there are a few methods that only make sense with JS objects, like WebGLContext::GetParameter. Depends on D5178
Reporter | ||
Comment 13•6 years ago
|
||
Provides a dedicated consumer RPC thread and Easy-to-use helpers for remotely executing a method on a WebGLContext. All parameters and the return value must be known to the ProducerConsumerQueue (see MAKE_TYPEINFO). These methods then syncronously or asynchronously serialize/deserialize parameters and establish a protocol for errors. Currently, the protocol simply crashes when an error is returned. Depends on D5179
Reporter | ||
Comment 14•6 years ago
|
||
If we created a ProxyWebGL*Context implementation as our content process WebGLContext (see Part 3) then the ProxyWebGL*Context::Create method will begin the process of creating a bridge with the compositor process using the IPDL CompositorBridge. Depends on D5181
Reporter | ||
Comment 15•6 years ago
|
||
Uses the AsyncCanvasRenderer facility to connect the content-side SharedSurface proxy used for composition with the GLFramebuffer used as the front-buffer in the compositor-side WebGLContext. Depends on D5182
Reporter | ||
Updated•6 years ago
|
Comment 16•6 years ago
|
||
Comment on attachment 9007002 [details] Bug 1477756: Part 3 - Add NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING_INHERITED Nathan Froyd [:froydnj] has approved the revision.
Reporter | ||
Comment 17•6 years ago
|
||
For kicks (and because its nearly impossible to know without a proof), I've been mocking up the behavior of the ProducerConsumerQueue in TLA+. Specifically, I've been focused on the memory order behavior around the atomic variables since this is where all of the major complexity arises. This behavior includes validating that local (cache) views of the data structure maintain consistency _and_ that compiler instruction reordering won't break it. I model the behavior of all possible memory order choices for the 5-line Producer Process algorithm-template and the 5-line Consumer Process algorithm-template the model uses. The expectation is that the current solution in code (which is fast) and that a solution that uses all memory_order_seq_csts (which is slow) will pass. The spec is about 95% done -- it does not properly model process-local views of the queue (only the read/write heads) but it does the rest. The comments should be clarifying. The Producer algorithm is assumed to fit this template, where PMO* are the memory orders we can set for the test: 0. DoSomethingP(); // takes 1 step and is independent of the other variables 1. w = write.load(PMO1); 2. r = read.load(PMO2); // PMO2 will be std::memory_order_acquire 3. M = Produce(r, w) Produce M values in M steps, asserting that they were formerly consumed. M is chosen to be a value that guarantees w never writes past r i.e. M < FreeBytes(r, w) NOTE: This means the queue can never be "full". This is because read/write give us no way to tell the difference between a full queue and an empty one. 4. write.store((w+M) % N, PMO3) unless the queue was full The consumer template, with CMO* as memory order placeholders, is: 0. DoSomethingC(); 1. w = write.load(CMO1); 2. r = read.load(CMO2); 3. M = Consume(r, w) Consume M values in M steps, asserting that they were formerly produced. M is chosen to be a value that guarantees r never reads past w i.e. M <= UsedBytes(r, w) 4. read.store((r+M) % N, CMO3) unless the queue was empty
Comment 18•6 years ago
|
||
Comment on attachment 9006995 [details] Bug 1477756: Part 1 - Allow UniquePtr with move semantics in IPDL messages Jed Davis [:jld] (⏰UTC-6) has approved the revision.
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
I'm not sure which patch this bug is from, but I came across what appears to be a bug. The canvas appears to not actually be rendering anything, or is transparent or something. This is demonstrated by the following program. Only the background color is visible, so you get a black box (you can change the background color to green to verify!), when in fact you should be getting a purple one. If you comment out the CSS you'll get a blank space. <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <style type="text/css"> canvas { border: 2px solid black; background-color: black; } </style> </head> <body> <canvas id="glcanvas" width="640" height="480"></canvas> </body> <script type="text/javascript"> document.addEventListener("DOMContentLoaded", function() { const c = document.querySelector("#glcanvas"); const gl = c.getContext("webgl"); if (!gl) { alert("Unable to initialize GL"); return; } gl.clearColor(0.7, 0.2, 0.4, 1.0); gl.clear(gl.COLOR_BUFFER_BIT); }); </script> </html>
Updated•6 years ago
|
Reporter | ||
Comment 21•6 years ago
|
||
Tests for the ProducerConsumerQueue, which was introduced in Part 2 of this patch series. Depends on D5183
Updated•6 years ago
|
Reporter | ||
Comment 22•6 years ago
|
||
These patch updates merge us with trunk. A couple of serious changes to be aware of: * I removed the old part 1: uniqueptr ipdl integration. That's been moved to bug 1500219 -- the patches in that bug need to be imported _first_, before the patches in this bug, until it lands. * I've added a new part (part 8) with PCQ tests. They still need work.
Reporter | ||
Comment 23•6 years ago
|
||
I assumed Phabricator would do something unexpected with this -- I just noticed that it ignored my changes to the commit messages that re-numbered the patches. So we now have patches #2-8 and another new #8. I'll figure out whats happening here.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 24•6 years ago
|
||
I think I'm on to a way to disentangle CC from WebGL object lifetimes, but I'll need time tomorrow to provide a good example. (with VertexArray and Buffer) I still feel like this will be easy to rebase these patches onto. We'll see what you all think tomorrow.
Reporter | ||
Comment 25•6 years ago
|
||
Comment 26•5 years ago
|
||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 28•5 years ago
|
||
This is my latest, a bit cleaned up but obviously not well organized as the patch is 750K. The checkin comment has a bit of detail on how it works. This won't build or anything but it should hopefully illustrate any sticking points.
Reporter | ||
Comment 29•5 years ago
|
||
Work was moved to GitHub: https://github.com/davidp3/gecko-dev/tree/remote-webgl
Updated•5 years ago
|
Comment 30•5 years ago
|
||
Adding [geckoview:fxr:p1]
whiteboard tag to track this bug for GeckoView in Firefox Reality. Ehsan says Firefox Reality spends "around 22% of each frame waiting for the GL driver on the main thread; asynchronous WebGL would eliminate most of that waiting and increase the FPS by up to 28%."
Assignee | ||
Comment 31•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #30)
Adding
[geckoview:fxr:p1]
whiteboard tag to track this bug for GeckoView in Firefox Reality. Ehsan says Firefox Reality spends "around 22% of each frame waiting for the GL driver on the main thread; asynchronous WebGL would eliminate most of that waiting and increase the FPS by up to 28%."
I disagree with this assessment.
Assignee | ||
Comment 32•5 years ago
|
||
I expect initial performance of remoted WebGL to be similar-to-worse than present WebGL, at least for the initial landing.
There are no free lunches here.
Comment 33•5 years ago
|
||
Why is that? Because of the cost of the command serialization? And are you referring to frame rate overall or to the time spent on the main thread?
Assignee | ||
Comment 34•5 years ago
|
||
Extra copies, ser/deser costs, and various synchronizations. Initially, we're going to have a greater than ideal number of synchronization points, which we'll be burning down in time, and in response to profiling of content.
Assignee | ||
Comment 35•5 years ago
|
||
(P2 because it's not necessarily aiming for this next release, and P1 here would mean "aiming for this current Nightly")
Assignee | ||
Comment 36•4 years ago
|
||
Producer-consumer queues are often implemented with 2 semaphores that track how
much data/room is in the queue. This is not that -- that makes the processes
more synchronized than is necessary and requires us to be able to track how
much room is left, which qould require the semaphore to track the number of
bytes used/remaining, since we permit variable message lengths. And
CrossProcessSemaphores can only increment/decrement one at a time (thanks,
Windows), which would make byte count tracking with semaphore operations O(n)
per transaction instead of O(1).
This implementation avoids all of that -- careful analysis bounds the semaphore
values to a maximum of 2.
Assignee | ||
Comment 37•4 years ago
|
||
- Context loss using RAII
- Move Program reflection Client-side
Assignee | ||
Comment 38•4 years ago
|
||
Unity 2018 benchmark (https://files.unity3d.com/marcot/benchmarks2018.2.5f1/) runs roughly equivalent after fixing that I never marked LinkResult.pending = false. :)
Assignee | ||
Comment 39•4 years ago
|
||
(This is a combination of 31 commits)
-
Fix Linux compilation.
-
Fix mac compilation.
-
CI compile fixes.
-
printf's size_t is %zu. %tu would be unsigned ptrdiff_t.
-
No non-ref Maybe args.
-
MOZ_CRASH for noreturn
-
Handle implied texture sizes, rewrite comment stripping.
-
Replace e.g. WebGLProgramInner with simpler webgl::ProgramKeepAlive.
-
Bounce ValidateProgram call off driver.
-
Uniform name length limit, cubemap fb-attach, non-array uniforms, undersized texImage views.
-
alignas for uint8_t[sizeof(float)*N] pun buffers.
-
CC fixes?
-
Fill attrib0Active.
-
Repair max-warnings limit.
-
This is basically required in order for CI's logging to not explode.
-
Don't cache WebGLMemoryTracker.
-
Deleted prog/shader error, no texSubImage(null), client-side fingerprint resist for exts.
-
Fix GetUniformIndices and MakeRangeFromView.
-
CC Traverse base class from within derived class to fix leaking the world. :(
-
PauseTransformFeedback
-
TexImage video fastpath
-
GetFragLocation for arrays
-
Forbid BindBufferRange during TF
-
Mark tests and fix RBAB query and test.
-
Change(!) query deletion behavior to match spec.
-
Mark conformance2/query/query.html failing for now.
-
Implicitly EndQuery on DeleteQuery while spec is in flux.
-
Fix error code for test.
-
RAII LruPosition for WebGL context limit.
-
Include std::list.
-
Mark CompileResult and LinkResult.pending as false when retrieved.
-
Hold strong-ref to NotLostData during Run<> to prevent LoseContext=>UAF.
-
Don't assume GetUniformLocation(foo+'[0]') means foo is an array.
-
Don't assume !mCanvasElement means !!mOffscreenCanvas.
-
Handle composition while context-lost.
-
All non-value-init members must be const or have inline init.
-
Mark passing tests on Linux.
Assignee | ||
Comment 40•4 years ago
|
||
- Revert some partial webgl+oop+vr code.
- More missing FuncScope.
- Fix compile errors.
- Refactor some ifdef'd code to branch and compile on all platforms.
- -Wno-error=unused-result for GCC to allow for (void)MustUse().
Reporter | ||
Comment 41•4 years ago
|
||
Moved some things, removed some things, blah blah blah.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 42•4 years ago
|
||
Pushed by jgilbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/71c122ac0ca7 Initial out-of-process WebGL implementation. r=mccr8,handyman https://hg.mozilla.org/integration/autoland/rev/ccfa767dba64 Client-side bindings mirror for precise CC, and merge similar codepaths. r=handyman https://hg.mozilla.org/integration/autoland/rev/7e2a2b1b416f Fix all webgl regression tests according to CI. r=handyman https://hg.mozilla.org/integration/autoland/rev/e1b0906509ef Fix non-webgl CI tests. r=handyman
Comment 43•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/71c122ac0ca7
https://hg.mozilla.org/mozilla-central/rev/ccfa767dba64
https://hg.mozilla.org/mozilla-central/rev/7e2a2b1b416f
https://hg.mozilla.org/mozilla-central/rev/e1b0906509ef
Assignee | ||
Updated•4 years ago
|
Comment 45•4 years ago
|
||
This feels like something we should get QA smoketesting ASAP since my understanding is that we don't have a lot of options with respect to backing out or disabling if major bugs are found before 74 goes to release.
Comment 46•4 years ago
|
||
Hi, Are there any steps or test pages we could use in order to verify this enhancement ?
Assignee | ||
Comment 47•4 years ago
|
||
Nothing specifically. https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html is the online version of the conformance tests, for which this shouldn't crash, and shouldn't cause toooo many failures. We run this on CI though.
Google Maps might be worth testing, but we expect a ton of bug reports if anything happens there. We expect to have pretty good verification by staying on top of bug and crash triage here.
Comment 48•4 years ago
|
||
I've tested this enhancement on Windows 10 and Mac 10.15.3 using the webGL conformance tests in Beta 74.0b7 and I got the following results:
Macintosh -
Test Summary: FAIL (2753 tests):
290846 subtests ran in 3571.65 seconds
PASSED: 2538 tests, 280536 subtests
NOT PASSED: 215 tests, 10310 subtests
FAILED: 215 tests, 10310 subtests
TIMED OUT: 0 tests, 0 subtests
SKIPPED: 0 tests, 0 subtests
Windows 10 -
Test Summary: FAIL (2753 tests):
289144 subtests ran in 7597.37 seconds
PASSED: 2670 tests, 284438 subtests
NOT PASSED: 83 tests, 4706 subtests
FAILED: 58 tests, 4681 subtests
TIMED OUT: 25 tests, 25 subtests
SKIPPED: 0 tests, 0 subtests
I've also tested Google Maps and I couldnt find any graphical or performance issues on either Mac or Windows.
Please note that on Ubuntu 16.04 the "Run Tests" button was greyed out, but I did test Google Maps and I couldn't find any noticeable issues.
I'm not sure what too many failures means but please let me know if we can mark this issue verified as Fixed based on these results.
Assignee | ||
Comment 49•4 years ago
|
||
That's great, thanks.
Comment 50•4 years ago
|
||
Based on Comment 49 I will update the tracking flag for this issue.
Description
•