requestAnimationFrame for Workers

NEW
Unassigned

Status

()

Core
Graphics: Layers
P3
normal
2 years ago
2 months ago

People

(Reporter: roc, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {dev-doc-needed, devrel-needed})

Trunk
dev-doc-needed, devrel-needed
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [gfx-noted])

This has been discussed in the WHATWG and there is consensus that we should have requestAnimationFrame in DedicatedWorkerGlobalScope. The semantics are just like main-thread rAF, but we'll implement it differently to support minimizing rendering latency.

Initially this will be used with WebGL-Workers, bug 709490, but later we'll add APIs to let the Worker manipulate CSS transforms and other inputs to the compositor.
Plan:
1) When rAF is called in a Worker, register the Worker with the compositor (probably via ImageBridge). Queue the rAF callback in a queue in the Worker.
2) When the compositor gets a vsync signal, send a rAF notification to all registered Workers. If there are none, continue with an immediate composite. Otherwise set a timer to expire after some fraction of the frame budget has elapsed (e.g. 5ms) and return to the event loop.
3) When a worker gets its rAF notification, it runs its callbacks and then sends a completion reply to the compositor. If one of the callbacks does another rAF (common), don't re-register until sending the completion reply.
4) When the timer expires, or all rAF notifications sent in the last iteration of step 2 have received completion replies, start the actual composite.

Implications:
a) Minimal latency. If a Worker meets its frame deadline, its results will be presented at the next vsync.
b) Isolation. If one Worker meets its deadline but others don't, the good Worker will still get the latency and frame rate guarantees.

Issues:
a) What policy should we use for Workers that fail to meet deadlines? Assuming the usual rAF loop, with the above plan, the compositor will issue an rAF notification at the next vsync after the completion of the Worker's previous rAF callback(s). Some people want this, but others are adamant that a stable frame rate is better, and some of those are adamant that it should be some factor of the frame rate (e.g. 30fps on a 60fps display). I plan to put this off to a separate bug. We may want to have API to let developers choose the policy, or implement their own policy.
b) With the above plan, when a worker uses rAF composition does not start for some milliseconds after vsync. That means the main thread(s) have a chance to finish their RefreshDriver ticks and submit layer tree updates before composition. It may be undesirable to affect main thread behavior in this way. Or, we could take the position that this is a good thing (reduces latency), and maybe even use the delayed-composition timer all the time, even when there are no Worker rAFs.

Anyone have an opinion on b), or any of this? The plan matches what we talked about in Whistler.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bas)
I guess I really should just ask Mason. Sorry for the spam.
Flags: needinfo?(mchang)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bas)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #1)
> b) With the above plan, when a worker uses rAF composition does not start
> for some milliseconds after vsync. That means the main thread(s) have a
> chance to finish their RefreshDriver ticks and submit layer tree updates
> before composition. It may be undesirable to affect main thread behavior in
> this way. Or, we could take the position that this is a good thing (reduces
> latency), and maybe even use the delayed-composition timer all the time,
> even when there are no Worker rAFs.

Actually right now it seems like there's nothing strictly preventing a refresh driver tick triggered by a vsync event from completing and having its layer tree update processed by the composition triggered by that same vsync event ... except the unlikely scheduling that would be required.
So if we have some main thread doing rAFs and a Worker doing rAFs, and the compositor is waiting for the Worker to reply and receives a layer tree update from the main thread in response to the most recent vsync signal, should we apply it or not? It's hard to not apply it, since it could be a sync Update call. But if we do apply it, then we're vulnerable to stuttering, if the layer transaction update alternates between arriving just before the Worker reply and just after it.

Though I guess we already have a stuttering issue if the layer transaction update alternates between arriving just before the next vsync or just after it. Right?

I think these problems would go away, or at least not get significantly worse, if we triggered refresh driver ticks with a message from the compositor when composition has actually begun, instead of sending the vsync signal to the refresh driver directly. I suspect there's a reason why we're not doing that already; what is it?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> So if we have some main thread doing rAFs and a Worker doing rAFs, and the
> compositor is waiting for the Worker to reply and receives a layer tree
> update from the main thread in response to the most recent vsync signal,
> should we apply it or not? It's hard to not apply it, since it could be a
> sync Update call. But if we do apply it, then we're vulnerable to
> stuttering, if the layer transaction update alternates between arriving just
> before the Worker reply and just after it.

Do you mean applying as in receiving the main thread transaction and handling it, or as in compositing it?
Delaying the reception/handling of the transaction would mean writing the kind of code that wake you up at night, but not scheduling the composition after receiving the main-thread transaction if we know the worker is about to schedule it and we would rather align to the worker is not unreasonable (I mean in term of how it affects our code).
FWIW, I am currently reworking vsync some in bug 1184283 -- I don't think it's going to affect anything here, other than the worker being able to directly listen to vsync events and get notified of them at the same time the compositor is.  I'd maybe suggest something like:

- workers with rAF register themselves with compositor thread
- vsync event occurs; compositor and all workers with rAF are notified of vsync
- on workers: rAF runs, and at the end, a completion message is posted to compositor thread with vsync timestamp.  (if rAF was not re-requested, then at the end of the rAF callback, unregister from compositor)
- on compositor
  1. compositor sets timeout and waits until all worker rAFs have signaled completion, or timeout occurs
  2. compositor then performs its normal compositing

however, I don't really like what happens on the compositor thread, if I'm understanding it right.  Instead, we really should be pipelining this... worker (and main thread!) rAF should be doing work on frame N+1, while the compositor draws frame N.  So when the compositor receives a vsync event, it should already have all the data needed to draw that frame, including the results of any main thread or rAF calls.  I understand that this reduces latency, and normally I'd be all for that, but the timing seems really tricky.

But, maybe we deal with that part later, and just keep in mind the possibility of the rest of the system working on frame N+1 (or N+k, really) while the compositor is putting frame N on the screen?
I guess we need to decide how important it is to remove that extra frame of latency. I would have assumed that VR was the case that needed this the most, but if Vlad isn't too concerned then maybe this isn't a super high priority.

If we do decide to go that way, then I think we should treat the main-thread the same as the workers. The compositor will wait on all of the content 'sources' (workers and main-thread), and then compositors when they're all done or the timeout expires.
(In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> If we do decide to go that way, then I think we should treat the main-thread
> the same as the workers. The compositor will wait on all of the content
> 'sources' (workers and main-thread), and then compositors when they're all
> done or the timeout expires.

Sounds like the simplest thing to implement so we could just implement it that way and remove the frame of latency for webgl workers as a second step when we find out experimentally that the extra frame of latency is unacceptable.
Wasn't there also UI workers that do need to be on the same frame as the compositor (in order to drive APZ animations), though?
Most of the proposal makes sense. Comments below.

> a) What policy should we use for Workers that fail to meet deadlines?

We don't really have a strong scheduling policy in general when we can't meet frame deadlines for either the compositor or main thread. We just try to keep up with vsync. If we do decide that we want some kind of policy that we want a stable frame rate, we should make that a global policy across all of our schedulers. Otherwise, I'd prefer to wait until the next vsync just for consistency sake.

> b) With the above plan, when a worker uses rAF, composition does not start for some milliseconds after vsync.

For the cases where we expect to use worker rAF, are we expecting the main thread to not be as busy? What's the reason in general that we'd like to prioritize worker threads over the main thread? It seems like we could make certain applications faster at the expense of the main thread and whole browser in general. What happens if this tab becomes a background tab, do they need to switch to an inactive rAF timer?

> Though I guess we already have a stuttering issue if the layer transaction update alternates between
> arriving just before the next vsync or just after it. Right?

Correct.

> I think these problems would go away, or at least not get significantly worse, if we triggered refresh > driver ticks with a message from the compositor
> when composition has actually begun, instead of sending the vsync signal to the refresh driver
> directly. I suspect there's a reason why we're not doing that already; what is it?

There isn't any reason, it just wasn't part of Silk. Previously, we just ticked things are random times, so there's no good reason why we can't do this.

I kind of like Vlad's proposal of painting / rAFing on frame N+1 and only compositing frame N.

A few other questions:
1) From a worker's rAF perspective, will it get the compositor / vsync timestamp?
2) Are we fairly certain we can composite and run the rAF js in < 5 ms?
Flags: needinfo?(mchang) → needinfo?(roc)
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #6)
> however, I don't really like what happens on the compositor thread, if I'm
> understanding it right.  Instead, we really should be pipelining this...
> worker (and main thread!) rAF should be doing work on frame N+1, while the
> compositor draws frame N.

OK, sounds like draw+composite in the same vsync interval is not something we need to do, or at least something we should not do yet. Good, that simplifies things. (Though I'm surprised, I thought every millisecond reduction of latency was worth getting for VR.)

Direct listening to vsync events is a good idea.

(In reply to Mason Chang [:mchang] from comment #9)
> > b) With the above plan, when a worker uses rAF, composition does not start for some milliseconds after vsync.
> 
> For the cases where we expect to use worker rAF, are we expecting the main
> thread to not be as busy? What's the reason in general that we'd like to
> prioritize worker threads over the main thread? It seems like we could make
> certain applications faster at the expense of the main thread and whole
> browser in general. What happens if this tab becomes a background tab, do
> they need to switch to an inactive rAF timer?

I didn't think about background tabs!

I think we should take a hard line from the start and say that rAF callbacks in Workers associated with background tabs do not fire at all. We could go further and say that if you don't have an OffscreenCanvas for a DOM canvas that's visible on the screen, rAF does not fire.

> A few other questions:
> 1) From a worker's rAF perspective, will it get the compositor / vsync
> timestamp?

I don't think that's necessary.

> 2) Are we fairly certain we can composite and run the rAF js in < 5 ms?

Not an issue if we're not trying to draw and composite in the same vsync interval.

So the revised plan is much simpler because the worker doesn't need to communicate with the compositor at all! But we do need to account for tab/canvas visibility.
1) Create an per-DOM-window vsync observer object that's set up on the main thread to listen to Vlad's new per-widget vsync. Watch out for widget changes, i.e. tab dragging.
2) Give that observer object a thread-safe list of Workers that are currently relevant for rAF. Populate this list from layout based on canvas and tab visibility.
3) When a vsync event happens, notify each relevant Worker directly, triggering its pending rAF callbacks
Flags: needinfo?(roc)
Er, no, because of e10s of course we can't send vsync signals directly from the widget to a worker.
So Vlad ... how were you planning to handle e10s in bug 1184283, when a refresh driver is running in a content process?
Flags: needinfo?(vladimir)
OK, it looks like you have PVSync for that and I can use the widget vsync observer. Great.

1) Create a per-DOM-window vsync observer manager (WVOM) that belongs to nsGlobalWindow, lazily instantiated.
2) Give each WVOM a mutex-protected list of WorkerCrossThreadDispatchers, one for each Worker of the nsGlobalWindow that has used rAF.
3) When a Worker does a rAF, if it's the first one, post an event to the main thread to register the Worker with its owning window. In any case, queue the callback in the Worker.
4) Give each WVOM a VSyncObserver that's registered with the Widget containing the nsGlobalWindow while the tab is not hidden (need to update this when the widget changes and visibility changes).
5) When the VSyncObserver fires, and when it's (re)registered with a widget, copy the WorkerCrossThreadDispatcher list to a local, and dispatch a WorkerTask to each entry of the copy that triggers rAF callback processing. Remove list entries for any WorkerCrossThreadDispatcher whose worker has died.
6) When adding a Worker's WorkerCrossThreadDispatcher to the list, check all the list entries and remove entries for Workers that have died. Simpler than having Worker death notify the WVOM.
Flags: needinfo?(vladimir)
Depends on: 1184283, 709490
Keywords: dev-doc-needed
> I didn't think about background tabs!
> 
> I think we should take a hard line from the start and say that rAF callbacks
> in Workers associated with background tabs do not fire at all. We could go
> further and say that if you don't have an OffscreenCanvas for a DOM canvas
> that's visible on the screen, rAF does not fire.

That sounds good to me.

> 
> > A few other questions:
> > 1) From a worker's rAF perspective, will it get the compositor / vsync
> > timestamp?
> 
> I don't think that's necessary.

If people are using rAF on a worker thread and the timestamp, the timestamp when the
worker runs will not be as uniform as the vsync timestamp. If this isn't an issue, OK,
but I think for smoothness purposes, it's better to feed the vsync timestamp.

> 1) Create a per-DOM-window vsync observer manager
> (WVOM) that belongs to nsGlobalWindow, lazily instantiated.

Why nsglobalwindow and not nsBaseWidget? If you directly hook into the widget,
you shouldn't need any logic to listen for widget/visibility changes as the widget
should do that for you and switch to the proper vsync notification.

> 4) Give each WVOM a VSyncObserver that's registered with the Widget containing the nsGlobalWindow
> while the tab is not hidden (need to update this when the widget changes and visibility changes).

Can you just hook into the widget's observer instead of adding another one?
Flags: needinfo?(roc)
We don't want to making widget/ code depend on Workers. And in step 4, when an nsGlobalWindow's tab is hidden or moves between widgets, we need to find all workers associated with the window.

I'm not sure what you mean by "the widget's observer". If I read the code in bug 1184283 right, the widget supports multiple gfx::VsyncObservers, and we need to add one or more for our use.
https://github.com/vvuk/gecko-dev/blob/4a21d44e9dcb3584dd3ebb41ebf8d750874d1c9a/widget/nsIWidget.h#L507
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> OK, it looks like you have PVSync for that and I can use the widget vsync
> observer. Great.

... except PVsync is tied to the main thread of the content process, which is no good for my purposes. Bah.

How about we make PVsync managed by PImageBridge so content process vsyncs can be dispatched on the ImageBridge thread?
Flags: needinfo?(vladimir)
Flags: needinfo?(vladimir)
I have patches for this here: https://bitbucket.org/rocallahan/mozilla-inbound/commits/all
Seems to work fine. It gets vsync notifications from VsyncChild, which currently lives on the main thread, so we need more work to make sure that worker vsync notifications aren't blocked by main-thread stalls. I'll do that work after bug 1184283 is done, since it will tread all over the same code. Likewise, the patch currently only works for workers in e10s content processes; extending it to support workers in the chrome process should happen after bug 1184283.

This patch supports nested workers, which I hadn't considered in the design discussions above. For nested workers we can't hold a pointer to the WorkerPrivate from an nsGlobalWindow. So this design revolves around a WorkerVsyncObserver object created by each worker using requestAnimationFrame, and each inner nsGlobalWindow holds a set of those. Starting and stopping vsync observing always happens on the main thread.
Blocks: 1123229
Keywords: devrel-needed
Assignee: roc → nobody
Whiteboard: [gfx-noted]
status-firefox43: affected → ---

Updated

3 months ago
Blocks: 1390089
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.