Closed Bug 857895 Opened 11 years ago Closed 9 years ago

Render canvas contents on a separate thread

Categories

(Core :: Graphics: Canvas2D, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

Details

Attachments

(2 files, 9 obsolete files)

      No description provided.
Attached patch Moz2D Api additions (obsolete) — Splinter Review
Attachment #733172 - Flags: feedback?(bas)
Attached patch DrawTargetSkia implementation (obsolete) — Splinter Review
Attached patch CanvasRenderingContext2D changes (obsolete) — Splinter Review
This is all just a prototype, needs a lot of tidying up. Still crashes sometimes inside skia code, haven't figured out if that's my bug or theirs.

Haven't been able to figure out a way to get cross canvas drawing to be asynchronous, the SkPicture code makes a deep copy of the bitmap when we issue the draw command. Unless the image is marked as immutable, but we can't do that for our DT's.
Attachment #733174 - Flags: feedback?(roc)
Comment on attachment 733172 [details] [diff] [review]
Moz2D Api additions

Review of attachment 733172 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/2D.h
@@ +868,5 @@
> +class DelayedDrawTarget : public DrawTarget
> +{
> +public:
> +  virtual TemporaryRef<DrawTargetReplay> Finish() = 0;
> +};

These could use some documentation...
Comment on attachment 733174 [details] [diff] [review]
CanvasRenderingContext2D changes

Review of attachment 733174 [details] [diff] [review]:
-----------------------------------------------------------------

We actually kinda don't want to flush drawing commands in the stable state, because that's still going to block the main thread. Instead we want to wait for the next layer tree update (empty transaction or not) and flush then. I set you on the wrong path there, sorry.

Arguably we don't even really want to flush then; we want some kind of object that represents a snapshot of the state of the canvas when all currently-pending commands have flushed, but no future ones, and set that on the canvas layer and ensure the compositor gets the right state when it composites. That's a lot more complicated though (sounds like the problem of drawImage with a canvas source).

Some more documentation in this patch would be helpful.

I think we need to use a thread pool here. Preferably a thread pool we share with other parts of the system.

::: content/canvas/src/CanvasRenderingContext2D.cpp
@@ +1028,5 @@
> +CanvasRenderingContext2D::FinishDelayedRendering()
> +{
> +  while (mQueuedOperations) {
> +    usleep(0);
> +  }

Ick! This should wait on a monitor or something.
 > We actually kinda don't want to flush drawing commands in the stable state,
> because that's still going to block the main thread. Instead we want to wait
> for the next layer tree update (empty transaction or not) and flush then. I
> set you on the wrong path there, sorry.

I think this is actually a problem with my naming. I went with the same as OpenGL, where Flush just makes sure all commands have been pushed to the remote thread, and Finish waits for them to complete executing.

So this shouldn't actually be a problem, we only wait for the rendering to complete
during layer transactions, or when something within the canvas api requires the pixel results.

> 
> I think we need to use a thread pool here. Preferably a thread pool we share
> with other parts of the system.

Agreed, but I couldn't see how to make this work with the existing thread pool code we have.

We often have multiple jobs being scheduled for a single canvas, and we need to ensure that only one of these run at a time, and that they run in order. Even if we put a lock on the canvas, it seems like ordering might not be preserved.

> 
> ::: content/canvas/src/CanvasRenderingContext2D.cpp
> @@ +1028,5 @@
> > +CanvasRenderingContext2D::FinishDelayedRendering()
> > +{
> > +  while (mQueuedOperations) {
> > +    usleep(0);
> > +  }
> 
> Ick! This should wait on a monitor or something.

Do we have any structures other than mozilla::Mutex for doing this? I couldn't figure out how it would work.

If each task takes hold of a lock, then we would temporarily unlock between jobs and there might still be other jobs for the same canvas queued. We could only unlock if we're the last job in the queue (decided by querying mQueuedOperations).

There still seems like a potential for failure since we don't actually take the lock until the rendering thread starts processing. We could (I believe) hit the situation where we push a job to the rendering thread, and then hit a Flush before it starts executing. In this case we'd grab the lock immediately and not wait for rendering.

I considered taking the lock on the main thread when we push the job, and releasing it on the rendering thread when we complete. Unfortunately, that hits a MOZ_ASSERT and crashed.
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> I think this is actually a problem with my naming. I went with the same as
> OpenGL, where Flush just makes sure all commands have been pushed to the
> remote thread, and Finish waits for them to complete executing.

OK great.

> We often have multiple jobs being scheduled for a single canvas, and we need
> to ensure that only one of these run at a time, and that they run in order.
> Even if we put a lock on the canvas, it seems like ordering might not be
> preserved.

How about using a single runnable per canvas and store the job queue on the canvas? When the runnable runs, process jobs until the queue is empty, then return. When adding a job, if the queue is empty then dispatch a runnable while the queue lock is held.
Attached patch Moz2D Api additions v2 (obsolete) — Splinter Review
Attachment #733172 - Attachment is obsolete: true
Attachment #733172 - Flags: feedback?(bas)
Attachment #734994 - Flags: review?
Attached patch DrawTargetSkia implementation v2 (obsolete) — Splinter Review
Attachment #733173 - Attachment is obsolete: true
Attachment #734995 - Flags: review?(gwright)
Attached patch CanvasRenderingContext2D changes (obsolete) — Splinter Review
Added a Monitor, ThreadPool and cleaned it up a lot.

Not requesting review yet, still want to test it more.
Attachment #733174 - Attachment is obsolete: true
Attachment #733174 - Flags: feedback?(roc)
Attachment #734994 - Flags: review? → review?(bas)
Comment on attachment 734994 [details] [diff] [review]
Moz2D Api additions v2

Review of attachment 734994 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/2D.h
@@ +889,5 @@
> +  /**
> +   * Finish writing to the DrawTarget and return a DrawTargetReplay object
> +   * that can be used to replay the commands onto another draw target.
> +   * Future use of the DelayedDrawTarget may result in a crash!
> +   */

Hrm, is there a particular reason we don't just want to expose FinishTo here directly?
(In reply to Bas Schouten (:bas.schouten) from comment #11)

> Hrm, is there a particular reason we don't just want to expose FinishTo here
> directly?

I wanted a marked position where we move the object between threads, in case the backend needs to do something for that.

I'm not sure if that's actually a real concern or not.
I'm a little wary of creating a new DrawTargetSkiaPicture when really DrawTarget only needs to wrap an SkCanvas, which can then be backed by {SkDevice, SkGpuDevice, SkPicture}.
(In reply to George Wright (:gw280) from comment #13)
> I'm a little wary of creating a new DrawTargetSkiaPicture when really
> DrawTarget only needs to wrap an SkCanvas, which can then be backed by
> {SkDevice, SkGpuDevice, SkPicture}.

That's basically what we have, except DrawTargetSkiaPicture retains the SkPicture* because we need it for replaying.
Comparing the profiles of chrome and us on fishbowl we definitely need this patch. Chrome benefits from ferrying all GL calls to a separate process and running them there. We instead do a DT::Flush from the refresh driver, blocking the main thread and throttling our ability to draw.
I looked at making this work while in Taiwan a few weeks ago. It looks like SkPicture alone is not really meant to be used for this kind of thing. They do have a drop-in replacement for deferred canvas, SkDeferredCanvas, which looks like it has potential. I have not figured out if that thing uses a separate thread or just batches the commands (what would be the point of that?).
I looked at SkDeferredCanvas and tried it. It batches draw calls. Its useful because you can cull invisible draw commands and avoid clearing (isFullFrame()). It doesn't do any cross thread business.
From casual inspections I am pretty sure SkDeferredCanvas would work here. Just call flush on a different thread and off you go. However, would it be cleaner to do this at the Moz2D level? Then every backend would benefit. We already have code to record. That code just needs some mild touchups to be replay-able.
(In reply to Andreas Gal :gal from comment #18)
> From casual inspections I am pretty sure SkDeferredCanvas would work here.
> Just call flush on a different thread and off you go. However, would it be
> cleaner to do this at the Moz2D level? Then every backend would benefit. We
> already have code to record. That code just needs some mild touchups to be
> replay-able.

We discussed this in Taiwan and the consensus seemed to be that in the short-term, using backend-specific support for this would be preferable to doing it i moz2d. Apparently D2D has something similar we could use there.
Also, is it really ok to call flush() from a different thread? Do you have to lock the canvas while it's flushing? I am skeptical.
Lock the canvas? why? we don't lock it right now either. It just records everything into a buffer and you replay the buffer. It might be a problem if the SkCanvas we are replaying into was allocated on a different thread and for some reason the GL thread-local business isn't right, but thats probably fixable. If we do this manually in Moz2D, then we would not even set up anything on the main thread. We just setup the recording part and create the DT OMT and reply into that one once recording is done.
(again, this would elegantly take care of eliminating unnecessary texture copies for us since we can optimize the recorded stream)
(In reply to Andreas Gal :gal from comment #21)
> Lock the canvas? why? we don't lock it right now either. It just records
> everything into a buffer and you replay the buffer.

Right, but we don't do any threading stuff right now. I don't think you want to be calling other stuff on SkDeferredCanvas from the main thread while flush() is going on from the helper thread. If we went the moz2d route then of course we would have no trouble replaying while also recording new commands.

> It might be a problem if
> the SkCanvas we are replaying into was allocated on a different thread and
> for some reason the GL thread-local business isn't right, but thats probably
> fixable. If we do this manually in Moz2D, then we would not even set up
> anything on the main thread. We just setup the recording part and create the
> DT OMT and reply into that one once recording is done.

Yeah. I thin Bas had some concerns that the current recording/playback stuff was not suitable for this kind of application, but I don't remember what it was.
Flags: needinfo?(bas)
(In reply to Andreas Gal :gal from comment #18)
> From casual inspections I am pretty sure SkDeferredCanvas would work here.
> Just call flush on a different thread and off you go.

I think you're right. The main problem (that I saw at least), was that SkPicture was sharing the path cache and this seemed to have threading issues. SkDeferredCanvas uses SkGPipe which writes the actual path into the stream.


> However, would it be cleaner to do this at the Moz2D level? Then every backend would benefit.

What we discussed in Taiwan was trying to do this behind an Moz2D api that could be implemented in a backend specific way, or a generic way.

That way we can take advantage of existing solutions without much engineering effort, and worry about bringing up our own implementation for the other platforms slightly later.

It's also plausible that the native versions (especially for D2D) would perform better than our one, depending on the amount of work we sink into it.


> We already have code to record. That code just needs some mild touchups to be
> replay-able.

The current recording code is designed for writing a standalone, platform generic file to disk. It does things like serializing entire fonts into the stream.

We should wait for Bas to comment, but the suggestion was that it wouldn't be easy to get this code performing well enough for what we need.
I agree it makes sense to do this iteratively. Using Skia here is the first step for sure.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #23)
> (In reply to Andreas Gal :gal from comment #21)
> > Lock the canvas? why? we don't lock it right now either. It just records
> > everything into a buffer and you replay the buffer.
> 
> Right, but we don't do any threading stuff right now. I don't think you want
> to be calling other stuff on SkDeferredCanvas from the main thread while
> flush() is going on from the helper thread. If we went the moz2d route then
> of course we would have no trouble replaying while also recording new
> commands.
> 
> > It might be a problem if
> > the SkCanvas we are replaying into was allocated on a different thread and
> > for some reason the GL thread-local business isn't right, but thats probably
> > fixable. If we do this manually in Moz2D, then we would not even set up
> > anything on the main thread. We just setup the recording part and create the
> > DT OMT and reply into that one once recording is done.
> 
> Yeah. I thin Bas had some concerns that the current recording/playback stuff
> was not suitable for this kind of application, but I don't remember what it
> was.

Some objects can be shared across threads and don't need to be serialized. The performance would be a lot better if we just moved references around for them. This would need to be done. I do think eventually we want all this stuff to just be in Moz2D and not use the underlying platform systems in general.
Flags: needinfo?(bas)
Comment on attachment 734994 [details] [diff] [review]
Moz2D Api additions v2

Review of attachment 734994 [details] [diff] [review]:
-----------------------------------------------------------------

I think this has probably been superseeded by the recording DrawTarget work.
Attachment #734994 - Flags: review?(bas)
Attachment #734994 - Attachment is obsolete: true
Attachment #734995 - Attachment is obsolete: true
Attachment #734995 - Flags: review?(gwright)
Attachment #734996 - Attachment is obsolete: true
Attached patch Rebased (obsolete) — Splinter Review
Quickly hacked up a rebase and removed the skia dependencies.

Appears to work, and I got a 16% win on CanvasMark 2013.

This probably won't work with SkiaGL enabled, might even explode.

Hopefully this is enough to experiment with shumway (tweaking kBatchSize, and the thread pool settings might be interesting).
Assignee: nobody → matt.woodrow
This will definitely blow up with SkiaGL. Not sure if there is going to be an easy way to make it work without going back to the one-GLContext-per-canvas way of doing things.
Results using this test:
http://areweflashyet.com/shumway/examples/swfmplayer/?swfm=/tmp/a83a.swfm&fast=true&score=true

cg only (baseline) 8924 (updates: 105, render 8819)
cg + patch         8545 (updates: 104, render: 8442)
skia only          4594 (updates: 129, render:4465)
skia + patch       3623 (updates: 101, render: 3521)
(In reply to Jet Villegas (:jet) from comment #31)
> Results using this test:
> http://areweflashyet.com/shumway/examples/swfmplayer/?swfm=/tmp/a83a.
> swfm&fast=true&score=true
> 
> cg only (baseline) 8924 (updates: 105, render 8819)
> skia only          4594 (updates: 129, render:4465)

Ignore my results from the last run as I had a merge error. Here's the actual results with the patch applied:
cg + patch     5520 (updates: 102, render: 5418)
skia + patch   2439 (updates:  93, render: 2346)

Note that the rendering is incorrect and the viewport appears smaller with the patch applied (on my retina MBP)
Is a lower score better? What do the numbers mean?
Attached patch Fixed transform bug (obsolete) — Splinter Review
Fixed the transform issue.

Results are unlikely to be as good as before, since it was drawing a smaller area which would skew results.
Attachment #8585824 - Attachment is obsolete: true
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #33)
> Is a lower score better? What do the numbers mean?

Yes, it's just elapsed time in ms.
That's pretty surprising that software Skia smokes CG so handily. I wonder how it compares on other workloads. Would be nice if we could eliminate all of the CG stuff.
The gradient code in CG is noticeably slower than Skia. This swf uses quite a bit of gradient.
This looks great on this test:
cg + patch     4923 (updates: 98,  render: 4825)
skia + patch   2169 (updates: 103, render: 2067)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #30)
> This will definitely blow up with SkiaGL. Not sure if there is going to be
> an easy way to make it work without going back to the
> one-GLContext-per-canvas way of doing things.

I think we could restrict the thread pool to a single thread for SkiaGL to avoid needing to do that.

It won't scale as well when you have multiple canvases, but should still be an improvement.

The other option is to have one GLContext per thread in the thread pool and make sure that canvases that share a GLContext also share the same thread. That's considerably harder though, so I'm not in a rush.
Rewrote the code to use MediaTaskQueue which is a huge simplification.
Attachment #8586604 - Attachment is obsolete: true
Depends on: 932958
Attachment #8587072 - Attachment is obsolete: true
Attachment #8594614 - Flags: review?(bas)
Attachment #8594614 - Flags: review?(bobbyholley)
Comment on attachment 8594614 [details] [diff] [review]
Run canvas rendering asynchronously

Review of attachment 8594614 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on the MediaTaskQueue usage with those bits fixed.

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +1069,5 @@
> +  mShutdownObserver = nullptr;
> +  FlushDelayedTarget();
> +  FinishDelayedRendering();
> +  mTaskQueue->BeginShutdown();
> +  mTaskQueue = nullptr;

Hmm, this is an interesting one - all of the callsites I'm aware of for shutting down MediaTaskQueue either use the returned promise or await idle. Assuming you don't need to wait for the queue to drain before shutting down the queue I _think_ that this is fine, since the runner should keep the task queue alive until the events are drained. But just to be safe, please MOZ_DIAGNOSTIC_ASSERT(IsEmpty()) in the MediaTaskQueue destructor.

@@ +1590,5 @@
> +    return;
> +  }
> +  mPendingCommands = 0;
> +
> +  mTaskQueue->Dispatch(new DrawCaptureTask(mDelayedTarget, mFinalTarget));

Please use an nsCOMPtr<nsIRunnable> and then .forget() it. This current line will go through the TemporaryRef overload, which we're trying to get rid of.
Attachment #8594614 - Flags: review?(bobbyholley) → review+
Comment on attachment 8594614 [details] [diff] [review]
Run canvas rendering asynchronously

Review of attachment 8594614 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +1474,5 @@
>          SkiaGLGlue* glue = gfxPlatform::GetPlatform()->GetSkiaGLGlue();
>  
>          if (glue && glue->GetGrContext() && glue->GetGLContext()) {
> +          // Don't use mFinalTarget (async canvas drawing) with SkiaGL, because we currently
> +          // use a single GLContext and need them all to be on the same thread.

This ought to also be true with Direct2D. We shouldn't be able to use D2D on another thread while using the same D3D device. But perhaps this is not true if only other D2D code is using that device, have you tested this?

@@ +1590,5 @@
> +    return;
> +  }
> +  mPendingCommands = 0;
> +
> +  mTaskQueue->Dispatch(new DrawCaptureTask(mDelayedTarget, mFinalTarget));

@bholley: That's so ugly...
(In reply to Bas Schouten (:bas.schouten) from comment #43)
> @bholley: That's so ugly...

In a few days you'll be able to just ->Dispatch(do_AddRef(new foo)) - See bug 1155059 comment 23.
(In reply to Bas Schouten (:bas.schouten) from comment #43)
> 
> This ought to also be true with Direct2D. We shouldn't be able to use D2D on
> another thread while using the same D3D device. But perhaps this is not true
> if only other D2D code is using that device, have you tested this?

Yes indeed, it crashes the driver :)

I've restricted it to OSX with an #ifdef at the moment, will add an extra comment to make that more clear.
Comment on attachment 8594614 [details] [diff] [review]
Run canvas rendering asynchronously

Review of attachment 8594614 [details] [diff] [review]:
-----------------------------------------------------------------

Hrm, I'm thinking there should be a way to make this work with D2D as well. Since I had the multithreaded DrawTargetTiled working with D2D as well. Hrm.
Attachment #8594614 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #46)
> Hrm, I'm thinking there should be a way to make this work with D2D as well.
> Since I had the multithreaded DrawTargetTiled working with D2D as well. Hrm.

Do you have the multithreaded DrawTargetTiled patches somewhere?
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=9737660&repo=mozilla-inbound
Flags: needinfo?(matt.woodrow)
Yeah, what pulsebot said. Dunno how it would have been on OS X, since you were on top of bustage which meant I couldn't allow the Mac builds to run, but every other platform exploded in what I presume is every single case where any test in any suite tried to use a canvas for anything.
Bobby, we now enforce tail dispatch for events created on the main thread and dispatch to a MediaTaskQueue. This is very much not what we want for this task queue, and hits assertions when we call AwaitIdle.

It seems like we want to enforce tail dispatch for the main thread wrt specific task queues (the one used by MDSM in particular), does that make sense?

Any other ideas?
Flags: needinfo?(matt.woodrow) → needinfo?(bobbyholley)
Yeah, we basically need to enforce tail dispatch for any (dispatcher, dispatchee) pair that is involved in state mirroring.

So I think we should rename aRequireTailDispatch to aSupportsTailDispatch, and use tail dispatch i.f.f. _both_ the the dispatcher and the dispatchee support tail dispatch.

Sound good? If so, please flag me for review on the patch.
Flags: needinfo?(bobbyholley)
Comment on attachment 8605023 [details] [diff] [review]
Only enforce tail dispatch if both source and dest support it

Review of attachment 8605023 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with that.

::: dom/media/StateMirroring.h
@@ +137,5 @@
>      {
>        MIRROR_LOG("%s [%p] adding mirror %p", mName, this, aMirror);
>        MOZ_ASSERT(OwnerThread()->IsCurrentThreadIn());
>        MOZ_ASSERT(!mMirrors.Contains(aMirror));
> +      MOZ_ASSERT(OwnerThread()->RequiresTailDispatch(aMirror->OwnerThread()), "Can't get coherency without tail dispatch");

The thing we care most about asserting is that canonical->mirror dispatches require tail dispatches, and this is asserting it for mirror->canonical dispatches, IIUC.

I'd prefer to assert aThread->SupportsTailDispatcher() in both Canonical::Impl() and Mirror::Impl(), and assert OwnerThread()->RequiresTailDispatch(aCanonical->OwnerThread()) in Mirror::Impl::Connect.
Attachment #8605023 - Flags: review?(bobbyholley) → review+
Depends on: 1165710
Depends on: 1165613
This seems to be causing all sorts of crashes on OS X at least. Can we back it out?
Depends on: 1165799
(In reply to Jeff Muizelaar [:jrmuizel] from comment #60)
> This seems to be causing all sorts of crashes on OS X at least. Can we back
> it out?

Yeah, will do.
Backed out:  https://hg.mozilla.org/integration/mozilla-inbound/rev/55b0b5c5f18c
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1165842
Depends on: 1165686
Depends on: 1165706
We've decided to put this on hold for now, and instead go with enabling skia-gl for OSX (bug 1150944).

Moz2D really isn't designed to be threadsafe (currently at least), and these patches really pushed the limits of what we can safely do.

It's hard to come up with something that isn't fragile without adding proper thread safety to Moz2d, and the bugs that result from this can be very subtle and hard to find.

We're seeing bigger wins (on most testcases) for much less effort using skia-gl, so I don't intend to look at this again unless we have other reasons to do so.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Flags: needinfo?(matt.woodrow)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: