Open Bug 1670108 Opened 1 year ago Updated 6 days ago

Create infrastructure for site-specific video player adapters


(Toolkit :: Picture-in-Picture, task, P3)






(Reporter: mconley, Assigned: whjones526)


(Blocks 5 open bugs)



(2 files)

This is part of a larger campaign to make it easier for us to:

  1. Add a playhead scrubber
  2. Add current play time
  3. Add captions
  4. Ship the mute toggle

and a whole host of other things. We haven't shipped many controls for the Picture-in-Picture player window because of how easy it is for video playback to break on popular sites. As an example, seeking on Netflix by changing the currentTime of the <video> causes playback to simply stop.

What we'd like to do is make it so that we can develop site-specific adapter classes that map to particular domain matchers (much like what we do with toggle and keyboard control policies here: These adapters would have a common interface that the PictureInPicture code could then talk to, without having to worry about the underlying site-specific manipulation going on.

It's not clear to me how best to do this. I think we should consult with someone from the WebCompat team to see what the best way of doing this is.

Hey twisniewski,

I remember talking to you about this a few weeks back. Did you have thoughts on how best we could do this? Are their pre-existing examples of similar things that we do that we can model our solution off of?

Flags: needinfo?(twisniewski)

I spoke to twisniewski about this last week.

The plan we came up with is to have the WebCompat add-on inject content scripts on sites we want adapters for. Those content scripts would be responsible for calling into the unprivileged page APIs for things like seeking, playing, pausing, etc.

The player.js script that handles user events for various controls would need to know that an adapter exists, and if so, instead of sending down the normal JSWindowActor messages to touch the <video> directly, would send down messages to the WebExtension content script.

I'm still working out how exactly that's going to work. I'm not sure yet if the PictureInPictureChild will talk to the content script, or player.js will.

Flags: needinfo?(twisniewski)
Severity: -- → N/A
Priority: -- → P3

Thinking about this more, I think what might be simplest is if the PictureInPictureChild, upon hearing the command from the parent to do something, checks to see if a content script from the WebCompat add-on has been registered for the associated video, and if so, sends a message to that content script to do the needful with the site's player API.

I think this means creating a new Experimental API endpoint for the WebCompat add-on to signal to PictureInPictureChild that it should try to communicate with a registered content script from the WebCompat WebExtension rather than touching the <video> directly.

Hey robwu, do we have prior examples of privileged browser JS talking to content scripts from Mozilla WebExtensions? Since we're getting closer to touching and manipulating unprivileged web content here with this approach, do you have any security guidance for me here?

Flags: needinfo?(rob)

It looks like you're looking for a way to run custom scripts, triggered from PictureInPictureChild.jsm. There are two minimal requirements:

  1. Run custom script when PIP functionality is needed.
  2. Identify the <video> element from that custom script.

It looks like you're currently looking to address (1) by unconditionally running a content script via the webcompat add-on (even if PIP is not used). I'm not clearly seeing how you intend to do (2). Were you thinking of somehow passing <video> to the content script?

A way to identify the <video> element from a content script is to dispatch a cancelable CustomEvent from the PIP script to the web page. If a content script has been registered, it can handle the event and call event.preventDefault() to signal to PIP that the event has been handled. This is relatively straightforward. A downside of this approach is that the web page can detect the event and interfere with the communication.

If you don't want web pages to interfere with PIP, then you would need a direct communication channel between the PIP child and the custom script. The WebExtension framework does not support the injection of experimental APIs in content scripts, DOM references cannot be passed passed through the extension messaging APIs either. A way to still communicate directly with the webcompat content script is by calling ExtensionProcessScript.getExtensionChild(extensionId).getContext(contentWindow).cloneScope to get a reference to the Sandbox where the content scripts run, and call a function in that sandbox.

An alternative to (1), (i.e. relying on the webcompat add-on to run custom scripts) is to create a Cu.Sandbox (with an expanded principal containing only the page's principal) and run the custom script there when PIP logic is invoked. Since you already have a PictureInPictureChild.jsm and a reference to the right content window, it is straightforward too. The advantage of this is that you don't run these custom script unless needed by PIP, and you will have a direct access to the sandbox, which means that you don't have to rely on internals of the extension framework.

touching and manipulating unprivileged web content here with this approach, do you have any security guidance for me here?

Design the implementation with the assumption that these custom scripts have been compromised.

Flags: needinfo?(rob)
Assignee: nobody → mconley

Hey robwu,

I've tried to craft something sensible based on what I understand about sandboxes. The way my current model works is that wrappers are written as extensions of a base PictureInPictureVideoWrapperBase class. PictureInPictureVideoWrapperBase knows how to talk to normal <video> elements, and subclasses can talk to special site-specific APIs to do the needful for special sites (see the included netflix.js as an example for seeking).

PictureInPictureChild, upon setting up the player, creates a sandbox, and evaluates the script for the PictureInPictureVideoWrapperBase, and any registered site-specific wrapper scripts. Then we instantiate PictureInPictureVideoWrapper in the event that we have a site-specific wrapper, or PictureInPictureVideoWrapperBase in the event that we don't. Calls to control the window are then marshalled through functions defined in video_wrapper_base.js which call the wrapper instance directly.

Does this pattern look reasonable to you, or is there a better / more sensible way to do this?

Flags: needinfo?(rob)

Hey DenSchub,

I was wondering if you had any thoughts with the approach I'm going for here too. Does this seem reasonable from the WebCompat add-on perspective?

Flags: needinfo?(dschubert)

From the snippet, it seems like you want to default to the base implementation unless explicitly specified otherwise.
Furthermore, there isn't a hard requirement for the site-specific wrapper (e.g. netflix.js) to rely on anything else besides the <video> and APIs in the web page. In particular, the other additions in the sandbox (including the base class) are not required.

Given this, I suggest to make the following changes:

  • Remove the sandbox for the common, default case (i.e. run the base class in PIP child).
  • Let the sandbox only export a standalone class that adheres to the expected interface, which only takes the video object.
  • Move the base logic outside the sandbox, and let it call into the sandbox if needed. An easy way to do this without lots of code duplication is by using a Proxy over the "base class", which calls the "sandbox" implementation if available, and falls back to the original "base" implementation otherwise. Similar results can also be achieved without Proxy by explicitly stating the "overridable" functions/properties.

I also have the following questions:

  • Are the wrappers <video>-specific, or website-specific?
  • Are the sandboxes meant to be <video>-specific, or website-specific?
  • What is responsible for nuking obsolete sandboxes? Currently it seems to be tied to the window actor. I can imagine that for single-page applications like YouTube, creating a new sandbox for every video without teardown may result in memory leaks and bugs.
  • What happens when the page enters/leaves the bfcache? Logic within the sandbox should ideally not be active any more when the page is frozen, in the bfcache.

About the implementation:

(In reply to Rob Wu [:robwu] from comment #4)

is to create a Cu.Sandbox (with an expanded principal containing only the page's principal) and run the custom script there when PIP logic is invoked.

Here, "expanded principal" is very significant. Your current draft of the patch uses Cu.Sandbox(originatingDoc.nodePrincipal, {.
Using exactly the same principal as the web page (regardless of wantXrays) means that there is no meaningful isolation between the web page and the sandbox.

When the web page observes any object from the sandbox, then it will be able to execute code in the sandbox, e.g. by walking up the prototype chain up to the Function constructor and then invoking it.

If you want to have some form of isolation, put brackets around the principal so that the sandbox uses an ExpandedPrincipal that subsumes the page's principal. Then the web page will be denied access to values from the sandbox (but the sandbox can still access content from the page, possibly through .wrappedJSObject if needed).

To see the difference:

  1. Visit and open the Browser Content Toolbox.
  2. Run:
win = tabs[0].content;
princ = win.document.nodePrincipal;
s1 = Cu.Sandbox(princ, { sandboxPrototype: win });
s2 = Cu.Sandbox([princ], { sandboxPrototype: win });

s1.eval(`window.wrappedJSObject.test1 = function(){ return "s1"; }`)
s2.eval(`window.wrappedJSObject.test2 = function(){ return "s2"; }`)
  1. Open the F12 devtools in and run:
test1(); // Result: "s1"
(new (test1.constructor)("return this"))() // Sandbox {}
test2(); // Result: "Uncaught Error: Permission denied to access object"

The first shows your current behavior.
The second shows an exploit of the vulnerability.
The third shows how the use of an ExpandedPrincipal results in the desired enforcement of security policies.

Flags: needinfo?(rob)

Sorry it took me this long to get back to you, Mike. I wanted to chat with Ksenia and Tom first, and then I got a bit distracted. :)

I assume that the adapters in your patch are not meant as a temporary workaround until something else is fixed, but they're meant as a permanent solution to make PiP more compatible. Is that assumption correct? If it is correct, I'm actually not super happy with our WebCompat Interventions addon being the vessel. I have multiple reasons, and I should explain them a bit!

  • The main idea behind our addon is to ship ephemeral interventions to fix something until either the site has fixed itself or we shipped a fix for something in Gecko. We have a fixed QA process that validates all currently shipped interventions to remove them as soon as possible. Shipping a lot of permanent'y interventions makes our code a bit harder to maintain - especially when we're talking about Picture-in-Picture stuff, which is not even covered by our current QA process.
  • We aim to have no hard Gecko-dependencies, besides some experimental WebExtension APIs and some TP stuff. The reason here is that we can ship a new release of our addon directly to release users via Balrog, and we have done so multiple times to address critical issues. Adding Gecko dependencies creates a bit of complexity here. We have to make sure that the dependencies exist in our rollout target versions, or we have to verify that the addon fails gracefully if they do not. I can't expect you to do that research, but the WebCompat team and WebCompat QA resources are also not really knowledgeable enough in PiP stuff to be responsible for that. So that creates a bit of a black hole where nobody is responsible, and in the worst case, we'd discover some breakage while trying to roll out a time-critical release.
  • We try to avoid platform-specific code as much as possible. The addon is not only shipped to Firefox Desktop, but it's also available as an android-component, and the entire PiP source is dead weight there (and may potentially even cause issues if the bindings are not designed to fail gracefully).

Since you're not really using any of our addon's "magic" (things like the about:compat UI, our URL-targeting system, console logging, ...) I wonder if you're open to the idea of moving your PiP interventions code into its own little system addon. The overhead of that is generally relatively low: if you're just shipping in m-c, you pretty much just have to touch the browser/extensions/, and create a simple manifest.json and for your code. Tom has already offered his help with that if needed, and I'm happy to help with code and coordination if required as well. I see a couple of advantages if you'd have your own addon:

  • You could do pretty much whatever you want without having to involve WebCompat folks in reviews and coordination.
  • We could stop worrying about cross-platform compatibility and failing gracefully for older Firefox versions.
  • Experimenting would be a lot easier for you as well: if you want to experiment on some PiP stuff, you already have your own vessel for that.

Note that I'm not blocking this patch's landing - if you want to get this shipped, I think having it in the WebCompat addon is fine for now. We can always talk about moving it at a later point, but I wanted to throw that idea out there so we can consider that.

Any thoughts? :)

Flags: needinfo?(dschubert) → needinfo?(mconley)
Assignee: mconley → jack1391
Flags: needinfo?(mconley)
Assignee: jack1391 → whjones526
Blocks: 1685549
See Also: → 1690076
See Also: → 1696796
Attachment #9210297 - Attachment description: Bug 1670108 - Infrastructure for site-specific adapters using indepedent video wrapper classes. → Bug 1670108 - [WIP] Infrastructure for site-specific adapters using indepedent video wrapper classes.
Attachment #9210297 - Attachment description: Bug 1670108 - [WIP] Infrastructure for site-specific adapters using indepedent video wrapper classes. → Bug 1670108 - Make site-specific adapters using independent video wrapper classes. r?mhowell, robwu
Attachment #9210297 - Attachment description: Bug 1670108 - Make site-specific adapters using independent video wrapper classes. r?mhowell, robwu → Bug 1670108 - Make site-specific adapters using independent video wrapper classes. r?mhowell,robwu
Blocks: 1687966
Blocks: 1704304
Component: Video/Audio Controls → Picture-in-Picture
Version: unspecified → Trunk
Blocks: 1742457
You need to log in before you can comment on or make changes to this bug.