Open Bug 706499 Opened 8 years ago Updated 2 years ago

Integrate Async Plugin composition with OMTC

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set

Tracking

()

People

(Reporter: BenWa, Unassigned)

References

Details

We currently draw plugins async. This means we should be able to draw them asynchronously from the main thread with OMTC. 

This is, at the very least, be possible on Mac and gives us an opportunity to clean up our usage of IOSurface. We currently call CGLTexImageIOSurface2D every frame where we should instead let the image layer know that were using 2 IOSurface as a double buffer and simply swap which IOSurface backed texture were drawing from.

The only foreseeable problem is that we communicate surface updates to the main gecko thread using:
  sync Show(NPRect updatedRect, SurfaceDescriptor newSurface)
    returns (SurfaceDescriptor prevSurface);

We could create a channel between the compositing thread and the plugin. What would you suggest Chris?
This is approximately what bug 598277 originally covered.  Some of the framework is already in place.

Yes, creating a channel between plugin and compositing thread is the right approach with the current codebase.  There's some new code coming in with the new async drawing model that might have different requirements.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #1)
> There's some new code coming in with
> the new async drawing model that might have different requirements.

No, that is not going to use IPDL for setting the surface, we'll use some kind of cross-process mutex instead.
Right, different requirements.

We could argue about that approach but I don't care about plugins anymore ;).
If you have something against that approach, let me know somewhere other than this bug --- I'd like your feedback.
As we discussed before, that approach is going to make it much harder to have non-glitchy drawing of multiple plugin instances, but "meh" at this point.  This also makes cross-process plugin drawing code diverge even more from cross-process content drawing, which is suboptimal maintenance-wise.  Bullet-proofing the cross-process mutex code against buggy plugins will also be a bit nontrivial.
I don't understand any of those points.
I'm ready to begin work on this for the Mac. We've verified that composition is async by the FPS counter continuing but this seems like an easy candidate to test the feature. Also mac plugin rendering using basic layers is horrible since we go IOSurface->thebes layer->texture.

What's the best way to get an IPC channel between the plugin thread, in the plugin process and the compositor thread.
You want to use the |opens| IPDL feature.  See 

http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/test/cxx/PTestOpens.ipdl

for IPDL syntax, and 

http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/test/cxx/TestOpens.cpp

for C++ usage.  You can create the channel from either the chrome or plugin process (not sure which is more convenient for you).  Then you'll want to Open() the parent end of the channel (see .cpp above) on the compositor thread in chrome, and the child end on the main thread in the plugin process.

Not sure how this should interacts with new drawing APIs.  But assuming those aren't going to land within the next few days, I would start by improving the current impl.
Also, you should be able to use the same PCompositor protocol for this channel as I assume you'll be using across main<-->compositor threads.
The current approach Bas is working on is to add a mode to ImageContainer which exports a cross-process mutex and a piece of shared memory that another process can use to set the current image on the container --- take the mutex and write some shared handle value into the shared memory. I'm not sure where he's at with that, you guys should coordinate.

We can use this for ShadowImageContainers as well. In fact maybe we can just use it for all ImageContainers. They're already a cross-thread container for images, making them cross-process makes sense.

The only problem is that sometimes we need to make SetCurrentImage happen as part of a transaction. Probably we should just have a separate method for that, that goes through IPDL and sets the image on the compositor thread. That method would be main-thread-only and used only by layout, not plugins or video decoding.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> The current approach Bas is working on is to add a mode to ImageContainer
> which exports a cross-process mutex and a piece of shared memory that
> another process can use to set the current image on the container --- take
> the mutex and write some shared handle value into the shared memory.

That assumes Mac plugins can work that way. If they can't, then, uh, talk to Bas!
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
Thanks for the info. I'm going to start on this tomorrow (unless we decide I should block on bas' changes).

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> The only problem is that sometimes we need to make SetCurrentImage happen as
> part of a transaction. Probably we should just have a separate method for
> that, that goes through IPDL and sets the image on the compositor thread.

I'm not particularly clear on what needs and doesn't need to be happen within a transaction? I assume that all async plugin can stand on its own?

Re: Cross process mutex

I see 4 options we can use:
1) We can use the cross process mutex but that means the compositor can block on a lock. The block should be really fast assuming cross process mutex are cheap.
2) We can reuse the current 'sync Show()' architecture. This suck because the plugin can block on the compositor which could be waiting on vsync.
3) We can triple buffer. No blocking anywhere but it's wasteful.
4) We can send the back buffer async between the compositor and the plugin. This can lead to a situation where the plugin wants to paint but does not own the back buffer. This happens if it's painting faster then we're compositing. In that case we delay the plugin repaint. This has the nice effect of throttling the plugin redraw to be no faster then what we're compositing, there's no blocking and no wasted memory to a triple buffer.

As you can see my favorite option is #4. Thoughts?

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #11)
> That assumes Mac plugins can work that way. If they can't, then, uh, talk to
> Bas!

The're both easy on Mac. IOSurfaces are very flexible.
(In reply to Benoit Girard (:BenWa) from comment #12)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> > The only problem is that sometimes we need to make SetCurrentImage happen as
> > part of a transaction. Probably we should just have a separate method for
> > that, that goes through IPDL and sets the image on the compositor thread.
> 
> I'm not particularly clear on what needs and doesn't need to be happen
> within a transaction? I assume that all async plugin can stand on its own?

Right. It can.

> Re: Cross process mutex
> 
> I see 4 options we can use:
> 1) We can use the cross process mutex but that means the compositor can
> block on a lock. The block should be really fast assuming cross process
> mutex are cheap.

I don't think this is a problem ultimately for the compositor. SetCurrentImage should be over very fast.

> As you can see my favorite option is #4. Thoughts?

Mine and Bas's is #1.

We've gone backwards and forwards on #4. It affects the plugin async drawing API design. I'm OK with it, but Bas loathes it. We'll have a separate notification that the plugin can use to throttle drawing.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> We've gone backwards and forwards on #4. It affects the plugin async drawing
> API design. I'm OK with it, but Bas loathes it. We'll have a separate
> notification that the plugin can use to throttle drawing.

It does make sense for the async NPAPI proposal but it's not as clear for the exists models. Plug-ins don't receive anything like DidComposite so it's up to us to throttle them. So #4 let's us throttle and swap surfaces all in one.

In any case I don't feel too strongly since there both good designs.
Sorry, missed this while traveling.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
> I don't understand any of those points.

(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> As we discussed before, that approach is going to make it much harder to
> have non-glitchy drawing of multiple plugin instances, but "meh" at this
> point.

Assume we have a page with instances A, B of plugin P.  If there are pages that expect A, B to be drawn during the same "event-loop iteration", i.e. "atomically", it's much harder to do this with the mutex design.  Does that make sense?  But like I said, I don't care about this case anymore.  

> This also makes cross-process plugin drawing code diverge even more
> from cross-process content drawing, which is suboptimal maintenance-wise.

Not sure what's unclear about this.  The original plan was to use PLayers for plugins as well as content and compositor.  Now we have a second (third in the current setup) system to maintain.

> Bullet-proofing the cross-process mutex code against buggy plugins will also
> be a bit nontrivial.

We have to prove that no sequence of plugin crashes will deadlock the browser or lead to other bad behavior.  This is the kind of thing I would use a model checker for.  It's not easy.  Frankly, my faith that client OS's get this right is quite low.  In fact, I'm pretty sure the linux cross-process mutex primitive is vulnerable to a race condition that could deadlock the compositor.  So we would need to build in another mechanism to guard against that.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #15)
> Assume we have a page with instances A, B of plugin P.  If there are pages
> that expect A, B to be drawn during the same "event-loop iteration", i.e.
> "atomically", it's much harder to do this with the mutex design.  Does that
> make sense?  But like I said, I don't care about this case anymore.

I can't think of a page I've seen that would care about this.

> > This also makes cross-process plugin drawing code diverge even more
> > from cross-process content drawing, which is suboptimal maintenance-wise.
> 
> Not sure what's unclear about this.  The original plan was to use PLayers
> for plugins as well as content and compositor.  Now we have a second (third
> in the current setup) system to maintain.

I want to integrate this deeply into our existing ImageContainer logic. It already supports atomic SetCurrentImage across threads in the same process, extending this across process isn't too strange.

> We have to prove that no sequence of plugin crashes will deadlock the
> browser or lead to other bad behavior.  This is the kind of thing I would
> use a model checker for.  It's not easy.  Frankly, my faith that client OS's
> get this right is quite low.  In fact, I'm pretty sure the linux
> cross-process mutex primitive is vulnerable to a race condition that could
> deadlock the compositor.  So we would need to build in another mechanism to
> guard against that.

Is there some specific bug in Linux PTHREAD_PROCESS_SHARED mutexes and PTHREAD_MUTEX_ROBUST_NP that you're thinking of?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> > We have to prove that no sequence of plugin crashes will deadlock the
> > browser or lead to other bad behavior.  This is the kind of thing I would
> > use a model checker for.  It's not easy.  Frankly, my faith that client OS's
> > get this right is quite low.  In fact, I'm pretty sure the linux
> > cross-process mutex primitive is vulnerable to a race condition that could
> > deadlock the compositor.  So we would need to build in another mechanism to
> > guard against that.
> 
> Is there some specific bug in Linux PTHREAD_PROCESS_SHARED mutexes and
> PTHREAD_MUTEX_ROBUST_NP that you're thinking of?

There was a problem I was worried about, but there's a pretty fabulous little hack that looks to correctly avoid it.  So nothing concrete worrying me for linux.  Fwiw, I feel more confident about win32 cross-process mutexes than linux ones.

The odds of this being implemented correctly in bionic (android libc) are just about 0%.  We'll probably need a fallback path for when we don't have this primitive.
To summarize, my main objections to the cross-process swap approach were
 (1) Eliminates possibility of atomic updates across plugin instances from the same plugin type
I don't care about this anymore, maybe no one ever did.

 (2) Making impl robust to plugin crashes
glibc/linux impl looks OK, win32 should be as good (hopefully).  Not sure about mac.  bionic/linux almost certainly broken.
 - Was using an atomic swap instead of mutex+update considered?  Easier to maintain than OS/libc-specific cross-process mutex for Tier Is: just two architectures.  Possibly worse for < Tier I, but then again we'd have to look at cross-process impl for < Tier I too.

 (3) Optimizing existing IPC to handle this case instead of building new mechanism
Odds of getting someone to do this soon are low, new mechanism seems OK.

If we can use an atomic swap instead of a cross-process mutex, I'd feel a little better about the approach.  But it seems OK overall.
I think using shared memory (ashmem) and atomic operations would work with bionic. In fact, that's the only thing I think would work on bionic, unless we basically reimplement the right APIs on top of what the kernel provides, which is much much more than what bionic exposes.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> To summarize, my main objections to the cross-process swap approach were
>  (1) Eliminates possibility of atomic updates across plugin instances from
> the same plugin type
> I don't care about this anymore, maybe no one ever did.
> 
>  (2) Making impl robust to plugin crashes
> glibc/linux impl looks OK, win32 should be as good (hopefully).  Not sure
> about mac.  bionic/linux almost certainly broken.
>  - Was using an atomic swap instead of mutex+update considered?

I did consider this, the problem is you need to block while the actual surface data is being composited. So in the rare case the compositor is -actually- currently blitting the plugin you want to wait for that to finish, so a simple interlocked exchange on the pointer won't work.
(In reply to Mike Hommey [:glandium] from comment #19)
> I think using shared memory (ashmem) and atomic operations would work with
> bionic. In fact, that's the only thing I think would work on bionic, unless
> we basically reimplement the right APIs on top of what the kernel provides,
> which is much much more than what bionic exposes.

For NPAPI Async Drawing the contention situation should be rare. We could do a spinlock through shared memory on bionic. That's pretty horrible though, but if there's no way to do it properly throught he scheduler, that might be the only option.

Fwiw, as far as I understand it, NPAPI Async Drawing is mainly wanted by adobe for windows.
Ugh. Busy-waiting is a really bad failure mode. Maybe we could somehow ignore Android for our immediate needs here but it seems likely this would come back to bite us.

Using IPDL and threads, though, means we have to do
  plugin main thread SetCurrentAsyncSurface sends message to plugin helper thread, blocks
  plugin helper thread makes async IPDL call to compositor thread
  compositor thread does SetCurrentImage
  compositor thread sends async IPDL reply to plugin helper thread
  plugin helper thread sends message to plugin main thread, which wakes and returns from SetCurrentAsyncSurface

Maybe we can hide that behind some abstraction that can also be implemented with a cross-process mutex.

We can change the plugin API so that SetCurrentAsyncSurface is a fully async operation, with a separate reply method, which gets us (I think) to

  plugin main thread makes async IPDL call to compositor thread and returns to event loop
  compositor thread does SetCurrentImage
  compositor thread sends async IPDL reply to plugin main thread

But Bas gets upset when I suggest that.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #22)
> We can change the plugin API so that SetCurrentAsyncSurface is a fully async
> operation, with a separate reply method, which gets us (I think) to
> 
>   plugin main thread makes async IPDL call to compositor thread and returns
> to event loop
>   compositor thread does SetCurrentImage
>   compositor thread sends async IPDL reply to plugin main thread
> 
> But Bas gets upset when I suggest that.

I think that designing the plugin API to expect an async response give us more flexibility. In cases where we know we can use a cross process mutex we can deliver the reply right away without hitting IPDL. In platforms where we can't get cross process mutex it lets us do something like you mention. The major problem with doing it this way is that we either have to have it go through the plugin's main thread or we need a separate thread in the cases where we need to use IPDL.

We should either make this API windows specific if that's the need we have, or we should make the API general enough to be implemented across platform.
Why do we have to busy wait while a swapped image is being composited?  That doesn't make sense to me.
We can't return the "old current" surface from SetCurrentAsyncSurface until the compositor has finished with it. So SetCurrentAsyncSurface as currently specced has to wait.
So the spec says SetCurrent is guaranteed to succeed and return the previous surface?  If so, ok I understand, but that's possibly a bit unfortunate.
See comments #22 and #23. Bas is strongly opposed to the obvious alternative.
(In reply to Benoit Girard (:BenWa) from comment #23)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #22)
> > We can change the plugin API so that SetCurrentAsyncSurface is a fully async
> > operation, with a separate reply method, which gets us (I think) to
> > 
> >   plugin main thread makes async IPDL call to compositor thread and returns
> > to event loop
> >   compositor thread does SetCurrentImage
> >   compositor thread sends async IPDL reply to plugin main thread
> > 
> > But Bas gets upset when I suggest that.
> 
> I think that designing the plugin API to expect an async response give us
> more flexibility. In cases where we know we can use a cross process mutex we
> can deliver the reply right away without hitting IPDL.

The problem is that you'd get that message on some thread where you're running our message loop. Meaning you'd still need to wait on whatever thread receives the reply in order to continue drawing. Meaning if you want to do what you'd do for example with Direct3D or GL, where you want to present your backbuffer and go on drawing right away to your old backbuffer, you wouldn't be able to take that approach.

I wonder if there's something that can be done using pipes to implement a sort of cross-process mutex on Android(perhaps in a sub-optimal way), without busy-waiting.
> I wonder if there's something that can be done using pipes to implement a sort of
> cross-process mutex on Android(perhaps in a sub-optimal way), without
> busy-waiting.

The problem is making that robust if the plugin process hangs or dies.
Here's a thought
 - in SetCurrent, try to atomic-swap back/front.  This should succeed most of the time.
 - if the front buffer is "busy", fall back on a synchronous swap with IPC.  Something like comment 22 if I understand the general idea correctly (some details are fuzzy).

This is faster than mutex-swap in the common case, and doesn't require an OS cross-process mutex primitive.  In the uncommon case, or on platforms where we don't have atomic swap primitives, we fall back safely.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #30)
> Here's a thought
>  - in SetCurrent, try to atomic-swap back/front.  This should succeed most
> of the time.
>  - if the front buffer is "busy", fall back on a synchronous swap with IPC. 
> Something like comment 22 if I understand the general idea correctly (some
> details are fuzzy).
> 
> This is faster than mutex-swap in the common case, and doesn't require an OS
> cross-process mutex primitive.  In the uncommon case, or on platforms where
> we don't have atomic swap primitives, we fall back safely.

Yeah, so, I'm wondering, couldn't we hide the second approach behind a 'cross-process' mutex abstraction, and only use it on android where we don't have actual cross-process mutex?

I.e. on a system without a cross-process mutex, do an interlocked compare and exchange, if it succeeds we're all good and continue, if it fails we send an IPDL synchronous message that does the job. That way the majority of our platforms that do support cross-process mutices get the easy, fast thing, and the more complex situation only needs to exist on platforms without a solid cross-process mutex.

Perhaps this is what you meant :) Just making sure. I'd hate to have this complexity in the actual async plugin code.
Depends on: 695845
No longer depends on: omtc
Depends on: omtc
Can we do this at least for the platforms that support OMTC today? Though plugins are going away, I suspect this may benefit GMPs and the CDM as well.
(In reply to Florian Bender from comment #33)
> Can we do this at least for the platforms that support OMTC today? Though
> plugins are going away, I suspect this may benefit GMPs and the CDM as well.

What's the status/plan with Quantum and this bug now? Can we close this or are there actionable items left?
Flags: needinfo?(milan)
Flags: needinfo?(bas)
I think we're covered for Flash, not sure there is much more we'd do here.
Flags: needinfo?(milan)
I concur with Milan.
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.