Closed Bug 1521964 Opened 5 years ago Closed 5 years ago

Provide a privileged API for compositing decoded video frames onto an alternate placeholder

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: mconley, Assigned: mconley)

References

Details

Attachments

(7 files, 3 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

Bug 1519885 tracks the overall work for getting user-initiated Picture in Picture for video working and shipped to users.

The original prototype developed for UX to think about has patches in bug 1520329. It uses mozCaptureStream and srcObject to plug a MediaStream from the originating <video> into a new <video> element loaded in the Picture in Picture window. That was the quick and dirty way of getting the effect that we wanted.

Talking to jya and mattwoodrow, the better way is probably to have the compositor render the decoded frames in the new Picture in Picture window, instead of doing MediaStream hackery.

So what we'd like is a way for privileged JS to tell the compositor, "take the frames rendering over at <originating video> and render them over at <picture-in-picture-window> instead".

cc'ing sotaro and nical at mattwoodrow's recommendation. Also cc'ing jbonisteel, since I might end up asking for some Graphics support for this. Worst-case scenario, I can try to implement this myself, but I'll still probably need guidance in that case.

Timeline-wise, we're hoping to have Picture in Picture riding up to beta sometime in Q1.

Here's the Trello card for jbonisteel's reference: https://trello.com/c/YOKw5qyT/3474-picture-in-picture-support-pip

An update here:

Having consulted with nical and Sotaro, I think I'm starting to converge on a design for this. Sotaro suggests that we might be able to configure the VideoSink to send frames to two different VideoFrameContainers - one for the originating tab <video>, and one for the Picture in Picture <video>. I'm looking at this option now.

Priority: -- → P3
Attached file Bug 1521964 - [WIP] Simple manual test (obsolete) —

Depends on D18705

With attachment 9041506 [details] and attachment 9041508 [details], video was not redirected when "Redirect frames" button was pushed, it was because HTMLVideoElement::GetVideoSize() did not return valid video size. The GetVideoSize() was called by nsVideoFrame. nsVideoFrame decides layout size of HTMLVideoElement.

When I modified the GetVideoSize() as to return a valid size, video was shown to 2nd video element.

Are we going to require/guarantee that the source and dest <video> elements are always in the same process (with or without fission)?

(In reply to Matt Woodrow (:mattwoodrow) from comment #5)

Are we going to require/guarantee that the source and dest <video> elements are always in the same process (with or without fission)?

Yes - the infrastructure in bug 1520329 that we're hoping to land to create the Picture in Picture window ensures that the opened window is hosting a <xul:browser> that's running in the same content process as the originating tab. It will need some slight modification once the Fission messaging APIs arrive, but it should be minor.

Component: Graphics → Audio/Video: Playback

(In reply to Sotaro Ikeda [:sotaro] from comment #4)

When I modified the GetVideoSize() as to return a valid size, video was shown to 2nd video element.

Perfect, thank you Sotaro. That seems to fix things. Instead of hard-coding values for GetVideoSize(), I modified my test case so that it clones the original <video> (which ends up copying the mMediaInfo data over).

I then modified VideoSink so that, instead of replacing mContainer, we can have a secondary VideoFrameContainer called mSecondaryContainer. By doing that, and mimicking what we already do with mContainer, I can get frames showing up in two places, which is nice.

So now that the proof-of-concept is built, I'd like to design a proper solution. Here's a strawman to ponder:

  1. Alter VideoSink so that it can have an optional secondary VideoFrameContainer, and a method to set it.
  2. Add a method to MediaDecoderStateMachine that can set the secondary VideoFrameContainer on a VideoSink on the owner thread, and another method for queueing a task to call that method on the main thread.
  3. Create a privileged WebIDL method called something like createCloneAndShareFrames() on HTMLVideoElement. It's job will be to clone the <video> that it's called on, and then set the clone's VideoFrameContainer as the secondary container for the original video's VideoSink (by calling into the MediaDecoderStateMachine methods added in (2)).

A few things that jump out at me right away:

  1. The original <video> owns the MediaDecoder / MediaDecoderStateMachine / VideoSink - that means that the VideoSink can assume pretty safely that the VideoFrameContainer for the original <video> exists - that's why it's a raw pointer. The cloned <video> I'm proposing above has different lifetime guarantees. The clone <video> can go away at any time. How do we handle that case?

One possible solution is to have the VideoSink hold the VideoFrameContainer alive with a RefPtr. That way, even if the HTMLVideoElement of the clone goes away, the VideoFrameContainer will still be held alive. That has the unfortunate consequence of keeping this secondary container around (and being sent frames) even if the PiP window has gone away - until the original <video> goes away.

A variation on that is to somehow have the VideoSink be notified if the clone goes away, and drop it's reference. I don't know what the best way of doing that is.

I initially thought we could have the VideoSink hold a weak reference to the VideoFrameContainer, but that seems like a bad idea if the VideoSink / MediaDecoderStateMachine is mostly operating on a different thread (unless we have some thread-safe WeakRef capability that I'm not aware of).

  1. If a clone exists, we need to keep sending it frames even if the original <video> ends up loading a new resource. For example, if a user is watching a YouTube playlist, we should expect the Picture in Picture window to stay open and be able to view the items in that playlist one after another without needing to re-open Picture in Picture.

If I understand correctly, once a <video> loads a new resource, in the worst case, we end up spawning a brand new MediaDecoder. I suspect that means a brand new MediaDecoderStateMachine and a new VideoSink. If we've got a clone sitting around, we'll need to detect this case, and hook up the secondary VideoFrameContainer on the new VideoSink. I'm not sure what the best hooks are for doing something like that.

Hey sotaro, any suggestions on the sanest way to hook this up given the above requirements?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #7)

One possible solution is to have the VideoSink hold the VideoFrameContainer alive with a RefPtr. That way, even if the HTMLVideoElement of the clone goes away, the VideoFrameContainer will still be held alive. That has the unfortunate consequence of keeping this secondary container around (and being sent frames) even if the PiP window has gone away - until the original <video> goes away.

It seems OK to make VideoFrameContainer alive with a RefPtr, since RefPtr<VideoFrameContainer> is already held multiple places like the following.

If we use RefPtr<VideoFrameContainer> in VideoSink for secondary VideoFrameContainer, we need to unregister the secondary VideoFrameContainer, when it is not necessary. HTMLMediaElement might be a good place to control it.

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #7)

(In reply to Sotaro Ikeda [:sotaro] from comment #4)

When I modified the GetVideoSize() as to return a valid size, video was shown to 2nd video element.

I initially thought we could have the VideoSink hold a weak reference to the VideoFrameContainer, but that seems like a bad idea if the VideoSink / MediaDecoderStateMachine is mostly operating on a different thread (unless we have some thread-safe WeakRef capability that I'm not aware of).

thread-safe WeakRef exists as ThreadSafeWeakPtr. It was added by Bug 1404742. But it seems not fit well to this use case, since the weak ref continue to exist in VideoSink.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #7)

If I understand correctly, once a <video> loads a new resource, in the worst case, we end up spawning a brand new MediaDecoder. I suspect that means a brand new MediaDecoderStateMachine and a new VideoSink. If we've got a clone sitting around, we'll need to detect this case, and hook up the secondary VideoFrameContainer on the new VideoSink. I'm not sure what the best hooks are for doing something like that.

HTMLMediaElement might be a good place to control it. HTMLMediaElement creates MediaDecoder, holds at most one MediaDecoder and control the playback. It is set by HTMLMediaElement::SetDecoder().

https://searchfox.org/mozilla-central/rev/fcd461352655be83000062a2166722abc8f16768/dom/html/HTMLMediaElement.cpp#7157

Though it is a bit tricky to set secondary VideoFrameContainer to VideoSink, since when MediaDecoderStateMachine is created, VideoSink does not exist yet. VideoSink is created asynchronously via MediaDecoder::SetSink().

https://searchfox.org/mozilla-central/rev/6e1c6f8946af98a631219ee8125dd4bef834effb/dom/media/MediaDecoder.cpp#173

Okay, thanks for the feedback sotaro - I've modified my proposal a bit based on it.

  1. Add two new private members to HTMLVideoElement: RefPtr<HTMLVideoElement> mCloneSource and RefPtr<HTMLVieoElement> mCloneTarget. We'll also add some protected setters for those members to assert that when setting one, the other is nullptr.

    We'll need to add some cycle-collector incantations to ensure that two HTMLVideoElements pointing at one another like this won't cause a leak.

  2. Introduce a new HTMLVideoElement WebIDL method called cloneFramesTo which takes a "target" HTMLVideoElement to send frames to.

    One of the rules will be that both the <video> that cloneFramesTo is being called on, and the target <video>, must both have HTMLMediaElement::mUnboundFromTree set to false, otherwise the method throws right away.

    Then, set the mCloneTarget member on self, and set mCloneSource on the target HTMLVideoElement using the setter created in 1.

    Next, it will check to see if an mDecoder already exists and is set up. If it doesn't, it'll bail out. If an mDecoder is ready to go, it'll access the MediaDecoderStateMachine, and instruct it to set the target's VideoFrameContainer as the secondary container.

  3. Override HTMLMediaElement::FinishDecoderSetup in HTMLVideoElement. Call the parent class implementation first. Then check to see if we've got an mCloneTarget set. If so, call the method that will access the MediaDecoderStateMachine and add the target's VideoFrameContainer as a secondary container. This should, if I understand correctly, handle the case where the source HTMLVideoElement loads a new video resource.

  4. If a HTMLVideoElement ever enters UnbindFromTree where it has mCloneSource set, tell the mCloneSource to access its MediaDecoderStateMachine (if available), and drop the secondary VideoFrameContainer. Clear the mCloneSource member, and the mCloneTarget member from the original video.

  5. If a HTMLVideoElement ever enters UnbindFromTree where it has mCloneTarget set, tell the mCloneTarget to drop its reference to self, and drop the reference to mCloneTarget.

I think the above vaguely gives me the requirements I need - namely, cloned frames, lifetime management (where the connection between the two HTMLVideoElements is broken as soon as one is removed from the DOM tree), and the ability to handle the case where the original video element loads a new resource.

Note that I put most of the logic inside of HTMLVideoElement instead of HTMLMediaElement, since I don't think this sort of thing will ever be used for other types of HTMLMediaElements.

Does the above sound reasonable, sotaro? How about you jya, any concerns with this idea?

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(jyavenard)

That sounds very reasonable to me! Single process makes it much more manageable :)

Yea, it is reasonable from gfx point of view :-)

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #11)

Note that I put most of the logic inside of HTMLVideoElement instead of HTMLMediaElement, since I don't think this sort of thing will ever be used for other types of HTMLMediaElements.

If it is implemented up to release quality, HTMLMediaElement might needs a lot of change. For example, HTMLMediaElement has 2 video playback mode, mDecoder and mSrcStream. This change is going to add third playback mode.

Sounds good to me too.

LGTM as starting point.

This is very exciting, and can be the base for full PiP support.

Flags: needinfo?(jyavenard)

These patches are still fluid while I work out lifetimes and some remaining bugs, but they'll at least describe where I'm trying to go if people want to take a look.

Attachment #9044339 - Attachment is obsolete: true
See Also: → 1528696

Hm... I'm getting negative leak reports on my try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e99d64972dbca4e56d8bd13342ea880ef9cba51e&selectedJob=229712357

jya, I see that you've dealt with something similar in bug 1500200... any idea how I should approach this?

Also, the TV tests are failing because it can't find my test? Hey gbrown, any idea what I'm doing wrong here with the TV tests?

Flags: needinfo?(jyavenard)
Flags: needinfo?(gbrown)
Attachment #9041506 - Attachment is obsolete: true
Attachment #9041508 - Attachment is obsolete: true

(In reply to Mike Conley (:mconley) (:⚙️) from comment #24)

Also, the TV tests are failing because it can't find my test? Hey gbrown, any idea what I'm doing wrong here with the TV tests?

test_cloneFramesTo is trouble because it is a chrome test (in chrome.ini) and also a media test (subsuite=media). We have chrome tests with subsuite clipboard and subsuite gpu, each of which have special tasks (mochitest-clipboard, mochitest-gpu) to handle them. But I think there is nothing expecting to run chrome/media tests.

Does test_cloneFramesTo actually run in any task? I don't see it running in mochitest-media.

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] from comment #25)

Does test_cloneFramesTo actually run in any task? I don't see it running in mochitest-media.

It just runs when I run it manually with ./mach mochitest <path>. Do I have to register it somewhere to make it part of a "task"? What do you recommend I do?

Flags: needinfo?(gbrown)

:jmaher -- You are probably more familiar with subsuite considerations? mconley is adding a mochitest-chrome test (in a chrome.ini) with subsuite=media. That would be the first test that is both chrome and media, and as such doesn't run anywhere in continuous integration yet. Do we need to expand mochitest-media to run chrome-media? Do you recall how to do that?

Flags: needinfo?(gbrown) → needinfo?(jmaher)

if we look at clipboard that will be an example of running multiple test harnesses as a single job (added in bug 1269872). You can see in the taskcluster config we call different test targets:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/mochitest.yml#184

The other way is to add a separate job for M(mda-c), this would be non e10s I assume. Do we expect many other tests to fall into this bucket? If not, then running mochitest-media-chrome as part of mochitest-media seems more logical (first option in this comment)

lastly, does this need to run as mochitest-media, or can it run with all the other mochitest-chrome tests in that specific job?

Regarding the test-verify concerns, we might need to update logic in one or both of these places:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/util/perfile.py#35
https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/per_test_base.py#45

Flags: needinfo?(jmaher)
Attachment #9044338 - Attachment description: Bug 1521964 - Add privileged HTMLVideoElement.cloneFramesTo WebIDL method. r?jya,ehsan → Bug 1521964 - Add privileged HTMLVideoElement.cloneElementVisually WebIDL method. r?jya,ehsan
Attachment #9045661 - Attachment description: Bug 1521964 - Don't suspend the video decoder when cloning frames. r?jya → Bug 1521964 - Don't suspend the video decoder when cloning a video visually. r?jya
Attachment #9044340 - Attachment description: Bug 1521964 - Add cloneFramesTo regression tests. r?jya → Bug 1521964 - Add cloneElementVisually regression tests. r?jya

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #28)

lastly, does this need to run as mochitest-media, or can it run with all the other mochitest-chrome tests in that specific job?

I don't mind running this with the other mochitest-chrome tests. Honestly, whatever is easiest is fine with me, so long as the media folk are happy with it.

What constitutes the least amount of work, but lets the test run in automation? Should I move the test to a different folder?

Flags: needinfo?(jmaher)

Hi sotaro,

So I think I've got something almost landable here (modulo the test stuff, and this negative leak I'm getting in automation...), but I've noticed a small bug that I can reproduce reliably manually, but can't seem to write a decent automated test for.

I'm noticing that in certain situations, even though frames are being sent from the VideoSink to the secondary VideoFrameContainer, they do not appear on screen until a main thread paint occurs (any kind of main thread paint will do, it seems). Do you know how I'd start to debug that?

Flags: needinfo?(sotaro.ikeda.g)

(I think I've confirmed that an ImageClient is connected)

the quickest solution here is to add the single test to the existing mochitest-chrome job. The only quirk is that mochitest-media requires virtual-with-gpu for windows-qr and mochitest-chrome doesn't require it. We should test this on windows10 in going this route. If this doesn't work, I can help you get the mda-c job created.

Flags: needinfo?(jmaher)

Keep in mind that mochitest-chrome always runs with e10s off so it's only appropriate for things that don't try to use content or GPU processes.

https://searchfox.org/mozilla-central/rev/dc0adc07db3df9431a0876156f50c65d580010cb/taskcluster/ci/test/mochitest.yml#171

Huh, in that case, maybe the simplest thing is to port my test over to a mochitest-plain test.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #24)

Hm... I'm getting negative leak reports on my try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e99d64972dbca4e56d8bd13342ea880ef9cba51e&selectedJob=229712357

jya, I see that you've dealt with something similar in bug 1500200... any idea how I should approach this?

Also, the TV tests are failing because it can't find my test? Hey gbrown, any idea what I'm doing wrong here with the TV tests?

For that last instance it was a default constructor operator that didn't set MOZ_COUNT_CTOR.
Check where a HTMLMediaElement is copied/constructed that wouldn't call MOZ_COUNT_CTOR

In https://phabricator.services.mozilla.com/D20021, you have changed and added various NS_IMPL_ISUPPORTS_CYCLE_COLLECTION_blah

that's where I would start investigating. It could be that it's really double-freed.

Flags: needinfo?(jyavenard)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #31)

Hi sotaro,

So I think I've got something almost landable here (modulo the test stuff, and this negative leak I'm getting in automation...), but I've noticed a small bug that I can reproduce reliably manually, but can't seem to write a decent automated test for.

I'm noticing that in certain situations, even though frames are being sent from the VideoSink to the secondary VideoFrameContainer, they do not appear on screen until a main thread paint occurs (any kind of main thread paint will do, it seems). Do you know how I'd start to debug that?

Main thread paint is necessary to update display list and layer transaction of video. It is triggered by VideoFrameContainer::InvalidateWithFlags(). It calls HTMLMediaElement::Invalidate().

The InvalidateWithFlags() is called from VideoFrameContainer::Invalidate() and MediaDecoder::InvalidateWithFlags().

For async image container, ClientImageLayer and ImageClientBridge are created on MainThread.

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Sotaro Ikeda [:sotaro] from comment #37)

Main thread paint is necessary to update display list and layer transaction of video. It is triggered by VideoFrameContainer::InvalidateWithFlags(). It calls HTMLMediaElement::Invalidate().

The InvalidateWithFlags() is called from VideoFrameContainer::Invalidate() and MediaDecoder::InvalidateWithFlags().

For async image container, ClientImageLayer and ImageClientBridge are created on MainThread.

Thanks, sotaro. I think I figured this out - I needed to Invalidate the visual clone's VideoFrameContainer anytime the original video is Invalidated.

Depends on: 1530862

(In reply to Mike Conley (:mconley) (:⚙️) from comment #38)

Thanks, sotaro. I think I figured this out - I needed to Invalidate the visual clone's VideoFrameContainer anytime the original video is Invalidated.

By the way, if there is no layout change, we do not need to invalidate it.

Blocks: 1528459
Blocks: 1528672
Blocks: 1528696
See Also: 1528696
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fbfa1fe0e12a
Add clone target and source members to HTMLMediaElement and setters on HTMLVideoElement. r=jya,mccr8
https://hg.mozilla.org/integration/autoland/rev/b7a51e2d1d66
Allow VideoSink to have a secondary VideoFrameContainer assigned to it. r=jya
https://hg.mozilla.org/integration/autoland/rev/45807d96ca7f
Add privileged HTMLVideoElement.cloneElementVisually WebIDL method. r=jya,Ehsan,smaug
https://hg.mozilla.org/integration/autoland/rev/e5e08269a546
Don't suspend the video decoder when cloning a video visually. r=jya
https://hg.mozilla.org/integration/autoland/rev/122c7fc5d9c5
Add cloneElementVisually regression tests. r=jya
https://hg.mozilla.org/integration/autoland/rev/5ffd3605c75c
Invalidate visually cloned element when clone source is invalidated. r=jya
https://hg.mozilla.org/integration/autoland/rev/d26b9b95a47e
Make PictureInPictureChild use the cloneElementVisually API. r=Felipe
Assignee: nobody → mconley
Depends on: 1597216
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: