Closed Bug 750943 Opened 12 years ago Closed 12 years ago

Define abstract interface for hardware device access: MediaEngine

Categories

(Core :: WebRTC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15
blocking-kilimanjaro ?
blocking-basecamp -

People

(Reporter: anant, Assigned: anant)

References

Details

(Whiteboard: [qa-])

Attachments

(1 file, 8 obsolete files)

For implementing media functions like getUserMedia, a common XPCOM interface will ease implementation across Android, Desktop and B2G.
Attached file Initial proposal for XPCOM interface (obsolete) —
This is a straw-man IDL Fabrice and I cooked up earlier today. We should start with this interface and add whatever we'll need as we move further along in the implementation efforts. It should be a good starting point for gUM implementation on all Gecko platforms. Some notes:

- get/setCapabilities take a jsval, intentionally free-form. This will contain fields and value like resolution, frame-rate, color-space (YUV, i420, RGB32 etc.) and so on. The properties from the GIPS libraries are Ok to start with but we should have these be as abstract as possible.

- Allocate reserves a particular device, but does not start the stream. A subsequent start call on the audio/video device object will turn it on and return a nsIMedia[Video|Audio]Stream.

- We will need to extend nsIDOMMediaStream to be able to construct a MediaStream object by composing a nsIMediaVideoStream and a nsIMediaAudioStream together.

- The error object will need to be filled in with various error values. I've combined all error codes into a single interface for brevity's sake, a case can be made for separate objects for each error callback type.
There is the question of why we must use XPCOM to define this interface! While it may seem like an overkill, my opinion is that we'll want nsIMediaSource and friends to be accessible from JavaScript, for instance, to implement a preview window or a permission prompt in Firefox. In the long run, I think it is healthier to allow these components to be extensible by add-ons as an example. Neither of these would be possible if we implemented this as a C++ virtual class.

Would love to get more feedback on this though, if there are compelling reasons to avoid XPCOM, it would be good to discuss them now.
roc's comments here would be helpful.

What is the expected usage pattern for Mediasource::allocate()?
None of these would be exposed to the Web or Web apps, right?

I'm not that motivated to support extending these in an add-on, to be honest, if it adds more work for us.

Enabling chrome UI is a reasonable use-case. But I wonder if we could leverage the Web-facing API there. E.g., fire an event at the chrome event handler when a Web page calls getUserMedia, and have the chrome event handler respond by calling getUserMedia itself (which always succeeds immediately when called with chrome privileges) and then have a special chrome-only API which can be called to pass a media stream (or an error) back to the untrusted code?
Attached patch XPCOM interface - v1 (obsolete) — Splinter Review
I forgot to add the vital nsIMediaEngine interface in the first IDL! This interface allows enumeration of nsIMediaVideo/AudioSources.
Attachment #620205 - Attachment is obsolete: true
(In reply to Randell Jesup [:jesup] from comment #3)
> What is the expected usage pattern for Mediasource::allocate()?

Allocate() should be called after the user authorizes a getUserMedia() call - but we don't have to turn on the device yet, just make sure that another application can't grab it.

When the returned MediaStream is actually assigned to an output (or a request to make a ProcessedMediaStream is received), we can call Start() to turn on the device and start pushing frames.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> None of these would be exposed to the Web or Web apps, right?

No, this is purely a privileged Gecko interface.

> I'm not that motivated to support extending these in an add-on, to be
> honest, if it adds more work for us.

I wasn't just thinking about add-ons, but also other Gecko based "products", such as the recent "Web Runtime" (or WebRT) we checked in for Apps support (Bug 725408). The WebRT is implemented as a separate XUL application, so it cannot access private Gecko classes. An XPCOM interface would ensure that WebRT (and by extension, installed Apps) will be able to use Media functionality.

> Enabling chrome UI is a reasonable use-case. But I wonder if we could
> leverage the Web-facing API there. E.g., fire an event at the chrome event
> handler when a Web page calls getUserMedia, and have the chrome event
> handler respond by calling getUserMedia itself (which always succeeds
> immediately when called with chrome privileges) and then have a special
> chrome-only API which can be called to pass a media stream (or an error)
> back to the untrusted code?

Yes, we were kicking around the idea of reusing getUserMedia for chrome too, but I think the API is missing some essential functionality to make it work. nsIMediaEngine lets you enumerate all local audio/video devices, as well as obtains names/uuids for them. We will need to do both of these in our permission prompt for getUserMedia. I suspect we will also come across some other low-level things we will need to do as we move further along the implementation, at which point we can add them to the XPCOM interface.

We could put in similar APIs into the privileged version of getUserMedia, but it felt cleaner to just go for a dedicated interface. It also avoids confusion between the privileged and non-privileged getUserMedia, as there is only one, which is the API exposed to content.
Anant: what is the use-case for the add-on accessing the mediastreams directly (and bypassing the security prompts!)?  Seems like a security hole someone could drive a truck through (with a not-kosher addon).

roc's suggestion sounds workable.  We have talked about having 'live' preview from all the cameras so you can select which one to use; to build the chrome UI for this we'd need to be able to enumerate the video and audio sources, and of source control which one we're opening.  

If multiple video tracks (or audio tracks) are allowed in a single getUserMedia() call, then the Chrome UI will need to be able to assemble a "new" MediaStream composited from the tracks from multiple preview streams.  And it certainly will need to remove tracks from a preview stream (assuming getUserMedia returns a "paired" set of audio and video streams....)  

For UI/Preview use, it may make more sense/be simpler for the Chrome to ask for only a single track in each getUserMedia() call.
(In reply to Randell Jesup [:jesup] from comment #8)
> Anant: what is the use-case for the add-on accessing the mediastreams
> directly (and bypassing the security prompts!)?  Seems like a security hole
> someone could drive a truck through (with a not-kosher addon).

That's a security risk with add-ons in general, not with this particular interface. All add-ons already have full system access (i.e. they can get to your camera anyway), there is really no point in trying to make it harder for them.

As for use cases, when I was working in the Rainbow add-on, I had to limit some pieces of functionality because there were some aspects of the video rendering pipeline that are private to Gecko (for good reason). But we should allow extensibility of our platform where we can, and my intuition is that in this case we can, without much extra work. It also allows third parties to write custom backends for some new fangled hardware for this interface. The purpose of this interface is to serve as an "HAL" above anything else - a common way to get at media devices in a platform-independent way.

Add-ons aside, I am more worried about WebRT. We're already having to work around Geolocation being a private class - WebRTs need to be able to do their own permission prompts depending on the context in which the app was installed and so on. I don't want to repeat the same mistake with Media.

> roc's suggestion sounds workable.  We have talked about having 'live'
> preview from all the cameras so you can select which one to use; to build
> the chrome UI for this we'd need to be able to enumerate the video and audio
> sources, and of source control which one we're opening.  

I don't doubt that we can make it work using getUserMedia. But it can only be done when we make special affordances for privileged code making the API call, and having only *one* function be the sole gateway for chrome code to get at the underlying media hardware, in my opinion, could quickly get out of hand (ioctl?!).
 (In reply to Anant Narayanan [:anant] from comment #9)

> Add-ons aside, I am more worried about WebRT. We're already having to work
> around Geolocation being a private class - WebRTs need to be able to do
> their own permission prompts depending on the context in which the app was
> installed and so on. I don't want to repeat the same mistake with Media.

 Totally off-topic, but geolocation uses nsIContentPermissionPrompt that gives you the opportunity to implement your own prompt. Can't you do that in webRT?
(In reply to Anant Narayanan [:anant] from comment #6)
> Allocate() should be called after the user authorizes a getUserMedia() call
> - but we don't have to turn on the device yet, just make sure that another
> application can't grab it.

I'm not sure I can imagine a user explicitly authorizing a device without getting some kind of preview (either video for a camera, or audio levels for a microphone). So I'm not sure there really is a case where we don't have to turn on the device yet.

Also, have you made sure you can actually implement this on all platforms?
(In reply to Fabrice Desré [:fabrice] from comment #10)
>  Totally off-topic, but geolocation uses nsIContentPermissionPrompt that
> gives you the opportunity to implement your own prompt. Can't you do that in
> webRT?

nsIContentPermissionPrompt lets us provide a custom permission prompt (or bypass it if the app was pre-auhorized) but we can't add features like allowing the user to "lie" about their location without injecting our own API. In other words, it would have been nice to be able to override some methods of the GeolocationService class.
(In reply to Timothy B. Terriberry (:derf) from comment #11)
> (In reply to Anant Narayanan [:anant] from comment #6)
> > Allocate() should be called after the user authorizes a getUserMedia() call
> > - but we don't have to turn on the device yet, just make sure that another
> > application can't grab it.
> 
> I'm not sure I can imagine a user explicitly authorizing a device without
> getting some kind of preview (either video for a camera, or audio levels for
> a microphone). So I'm not sure there really is a case where we don't have to
> turn on the device yet.

Actually, I think the more common case will be getUserMedia returning without a permission prompt. Most users/sites will want to persist camera permissions, atleast for places like Google Hangouts. In the first run case, however, you are right that Allocate() will be immediately followed by a Start().

> Also, have you made sure you can actually implement this on all platforms?

Not yet! The interface here is simply a close mapping of the GIPS API, so we know we can implement this atleast on Desktop (and possibly Android). I'm sure we'll need several tweaks to the API as we move further along the implementations. We weren't planning to commit the IDL to the tree just yet, until we've reached a reasonably stable interface and have finished atleast the desktop and b2g implementations of this component.
(In reply to Anant Narayanan [:anant] from comment #7)
> I wasn't just thinking about add-ons, but also other Gecko based "products",
> such as the recent "Web Runtime" (or WebRT) we checked in for Apps support
> (Bug 725408). The WebRT is implemented as a separate XUL application, so it
> cannot access private Gecko classes. An XPCOM interface would ensure that
> WebRT (and by extension, installed Apps) will be able to use Media
> functionality.

I don't understand how this is relevant. Web Apps will be using getUserMedia. Any XUL in WebRT would have the same needs as the browser UI.

> Yes, we were kicking around the idea of reusing getUserMedia for chrome too,
> but I think the API is missing some essential functionality to make it work.
> nsIMediaEngine lets you enumerate all local audio/video devices, as well as
> obtains names/uuids for them. We will need to do both of these in our
> permission prompt for getUserMedia.

Just make chrome getUserMedia return a MediaStream containing tracks for all available devices, and expose metadata on each track identifying the device. Tracks can be added and removed from MediaStreams dynamically, so this also handles dynamic device addition/removal.

> I suspect we will also come across some
> other low-level things we will need to do as we move further along the
> implementation, at which point we can add them to the XPCOM interface.

We can probably add them to getUserMedia/MediaStream too.

> We could put in similar APIs into the privileged version of getUserMedia,
> but it felt cleaner to just go for a dedicated interface. It also avoids
> confusion between the privileged and non-privileged getUserMedia, as there
> is only one, which is the API exposed to content.

I'm not too concerned about that. I'm much more concerned about divergent APIs between chrome and content, which has historically been a source of pain. (Often because the content API keeps growing new features, and we have to keep duplicating those features in the separate chrome API.)

(In reply to Randell Jesup [:jesup] from comment #8)
> If multiple video tracks (or audio tracks) are allowed in a single
> getUserMedia() call, then the Chrome UI will need to be able to assemble a
> "new" MediaStream composited from the tracks from multiple preview streams. 
> And it certainly will need to remove tracks from a preview stream (assuming
> getUserMedia returns a "paired" set of audio and video streams....)  

That's OK because the MediaStreams spec already defines a constructor that supports this.

(In reply to Anant Narayanan [:anant] from comment #9)
> As for use cases, when I was working in the Rainbow add-on, I had to limit
> some pieces of functionality because there were some aspects of the video
> rendering pipeline that are private to Gecko (for good reason). But we
> should allow extensibility of our platform where we can,

I don't think we should plumb extensibility into our core infrastructure. I think the model we've followed for video codecs is a good one: our codec interface is all in C++, it's not extensible through XPCOM, and it's not frozen. This makes our code simpler, faster and more maintainable. You can patch in custom decoders at build time. You can also patch in a custom decoder that acts as glue to a run-time-extensible framework with a more frozen interface --- we've recently landed GStreamer support that way.

I'd like to follow that model here: make the code we ship all-C++. Other media sources can be patched in at build time. If we later want to add extension points for run-time-pluggable media sources, we can do that.

> Add-ons aside, I am more worried about WebRT. We're already having to work
> around Geolocation being a private class - WebRTs need to be able to do
> their own permission prompts depending on the context in which the app was
> installed and so on. I don't want to repeat the same mistake with Media.

WebRT shouldn't have to work around anything. We should modify our code to expose the interfaces WebRT needs (but *not* all the interfaces we think WebRT might possibly need in the future).

Using XPCOM is no substitute for thoughtful interface design.

> I don't doubt that we can make it work using getUserMedia. But it can only
> be done when we make special affordances for privileged code making the API
> call, and having only *one* function be the sole gateway for chrome code to
> get at the underlying media hardware, in my opinion, could quickly get out
> of hand (ioctl?!).

Not that bad, since MediaStreams or their tracks can expose whatever API we want.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> (In reply to Anant Narayanan [:anant] from comment #7)
> > I wasn't just thinking about add-ons, but also other Gecko based "products",
> > such as the recent "Web Runtime" (or WebRT) we checked in for Apps support
> > (Bug 725408). The WebRT is implemented as a separate XUL application, so it
> > cannot access private Gecko classes. An XPCOM interface would ensure that
> > WebRT (and by extension, installed Apps) will be able to use Media
> > functionality.
> 
> I don't understand how this is relevant. Web Apps will be using
> getUserMedia. Any XUL in WebRT would have the same needs as the browser UI.

It would have similar needs but not quite the same. But, we can let WebRT customize the UI via nsIContentPermissionPrompt like in Geolocation's case, I suppose, even without XPCOM.

> Just make chrome getUserMedia return a MediaStream containing tracks for all
> available devices, and expose metadata on each track identifying the device.
> Tracks can be added and removed from MediaStreams dynamically, so this also
> handles dynamic device addition/removal.

Ok, this sounds like a good start.

> > We could put in similar APIs into the privileged version of getUserMedia,
> > but it felt cleaner to just go for a dedicated interface. It also avoids
> > confusion between the privileged and non-privileged getUserMedia, as there
> > is only one, which is the API exposed to content.
> 
> I'm not too concerned about that. I'm much more concerned about divergent
> APIs between chrome and content, which has historically been a source of
> pain. (Often because the content API keeps growing new features, and we have
> to keep duplicating those features in the separate chrome API.)

Fair enough.

> > Add-ons aside, I am more worried about WebRT. We're already having to work
> > around Geolocation being a private class - WebRTs need to be able to do
> > their own permission prompts depending on the context in which the app was
> > installed and so on. I don't want to repeat the same mistake with Media.
> 
> WebRT shouldn't have to work around anything. We should modify our code to
> expose the interfaces WebRT needs (but *not* all the interfaces we think
> WebRT might possibly need in the future).
> 
> Using XPCOM is no substitute for thoughtful interface design.

So if we don't use XPCOM to define this hardware abstraction layer, shall we re-purpose this bug to instead define a C++ header instead? Each platform can write a class implementing that interface, this will help us because we'll have to write the getUserMedia DOM bindings only once.
(In reply to Anant Narayanan [:anant] from comment #15)
> So if we don't use XPCOM to define this hardware abstraction layer, shall we
> re-purpose this bug to instead define a C++ header instead? Each platform
> can write a class implementing that interface, this will help us because
> we'll have to write the getUserMedia DOM bindings only once.

Yes please!
Blocks: 752351
Changed title, filed Bug 752351 to track implementation of the first "fallback" backend for this MediaEngine interface.

Suhas already has a similar interface written up in Bug 739566. We should probably take the minimal amount we need for an initial getUserMedia implementation on two platforms (b2g & desktop) and land it via this bug.
Summary: getUserMedia and family needs a common backend XPCOM interface → Define abstract interface for hardware device access: MediaEngine
Blocks: 752352
Attached patch C++ Interface - v1 (obsolete) — Splinter Review
A simple MediaEngine interface. Most of the methods map directly to their GIPS equivalents so the desktop backend should be easy. Fabrice, let me know how this works for B2G.

I wasn't able to figure out how to make a SourceMediaStream without creating a corresponding nsDOMMediaStream. Is that possible?

There's also a fallback implementation for all platforms that I will attach shortly in bug 752351.
Attachment #620318 - Attachment is obsolete: true
Attachment #623571 - Flags: feedback?(roc)
Attachment #623571 - Flags: feedback?(fabrice)
(In reply to Anant Narayanan [:anant] from comment #18)
> I wasn't able to figure out how to make a SourceMediaStream without creating
> a corresponding nsDOMMediaStream. Is that possible?

It's not quite trivial because currently lifetime management depends on nsDOMMediaStream cleaning up its MediaStream. But I'm actually working on fixing that myself. For now, just create the nsDOMMediaStream.
Comment on attachment 623571 [details] [diff] [review]
C++ Interface - v1

Review of attachment 623571 [details] [diff] [review]:
-----------------------------------------------------------------

feedback+ other than that, although I don't have much insight into what this interface should be.

::: content/media/nsMediaEngine.h
@@ +69,5 @@
> +  kVideoUYVY,
> +  kVideoIYUV,
> +  kVideoARGB,
> +  kVideoRGB24,
> +  kVideoUnknown

It might be better to have the underlying engine be responsible for directly creating mozilla::layers::Image objects. Then you might not need to expose video types in this interface.
Attachment #623571 - Flags: feedback?(roc) → feedback+
Comment on attachment 623571 [details] [diff] [review]
C++ Interface - v1

Review of attachment 623571 [details] [diff] [review]:
-----------------------------------------------------------------

feeback+, with a couple of other comments :
- How do you build a stream from a video source and an audio source?

Also, nit: I think we don't use the "ns" prefix anymore for new classes.

::: content/media/nsMediaEngine.h
@@ +26,5 @@
> +  virtual ~nsMediaEngine() {};
> +  
> +  /* Return an array of video sources. Also include devices that are
> +   * currently unavailable. */
> +  virtual nsMediaEngineVideoSource* EnumerateVideoDevices() = 0;

we should use real arrays there.

@@ +42,5 @@
> +public:
> +  virtual ~nsMediaEngineSource() {};
> +  
> +  virtual nsresult GetName(nsAString&) = 0;
> +  virtual nsresult GetUUID(nsAString&) = 0;

how is uuid useful for this API?

@@ +69,5 @@
> +  kVideoUYVY,
> +  kVideoIYUV,
> +  kVideoARGB,
> +  kVideoRGB24,
> +  kVideoUnknown

I agree with Robert. For instance the gonk engine will very likely use a special image layer to benefit from the gpu.
Attachment #623571 - Flags: feedback?(fabrice) → feedback+
(In reply to Fabrice Desré [:fabrice] from comment #21)
> feeback+, with a couple of other comments :
> - How do you build a stream from a video source and an audio source?

Yes, I was just writing a nsDOMMediaStreamHelper to handle this. This class will also let you return a MediaStream along with Allocate() that you can pass upward to content, and it will automatically call Start() when content assigns the MediaStream to something (video element, PeerConnection etc.). This saves us from turning on the hardware until the media stream is actually used.

> Also, nit: I think we don't use the "ns" prefix anymore for new classes.

"moz" prefix, then? ;)

 
> ::: content/media/nsMediaEngine.h
> > +  /* Return an array of video sources. Also include devices that are
> > +   * currently unavailable. */
> > +  virtual nsMediaEngineVideoSource* EnumerateVideoDevices() = 0;
> 
> we should use real arrays there.

The pointer is a real C++ array... but,

I tried to make nsTArray work, but it cannot make an instance of a child class while casting it to the parent. What other arrays are there that we can use?

> > +  virtual nsresult GetName(nsAString&) = 0;
> > +  virtual nsresult GetUUID(nsAString&) = 0;
> 
> how is uuid useful for this API?

It's not strictly necessary, but I can imagine a case where an implementation might maintain a hash table of UUIDs to Objects. It's not necessary until you're dealing with a very large number of devices, but it seemed harmless to add anyway. The other case I was thinking of was where some devices don't have names, and you could fallback to the UUID; but that's probably worse than just displaying a generic name like "Default Camera".

> > +  kVideoUYVY,
> > +  kVideoIYUV,
> > +  kVideoARGB,
> > +  kVideoRGB24,
> > +  kVideoUnknown
> 
> I agree with Robert. For instance the gonk engine will very likely use a
> special image layer to benefit from the gpu.

I'll remove it, the fallback engine wasn't using it either since it was creating its own Image objects.
(In reply to Anant Narayanan [:anant] from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #21)
> > feeback+, with a couple of other comments :
> > - How do you build a stream from a video source and an audio source?
> 
> Yes, I was just writing a nsDOMMediaStreamHelper to handle this. This class
> will also let you return a MediaStream along with Allocate() that you can
> pass upward to content, and it will automatically call Start() when content
> assigns the MediaStream to something (video element, PeerConnection etc.).
> This saves us from turning on the hardware until the media stream is
> actually used.
> 
> > Also, nit: I think we don't use the "ns" prefix anymore for new classes.
> 
> "moz" prefix, then? ;)

No prefix at all, like MediaStreamGraph for instance.

>  
> > ::: content/media/nsMediaEngine.h
> > > +  /* Return an array of video sources. Also include devices that are
> > > +   * currently unavailable. */
> > > +  virtual nsMediaEngineVideoSource* EnumerateVideoDevices() = 0;
> > 
> > we should use real arrays there.
> 
> The pointer is a real C++ array... but,

How do you know the number of items? Add a NULL in the end?

> I tried to make nsTArray work, but it cannot make an instance of a child
> class while casting it to the parent. What other arrays are there that we
> can use?

That's strange. You mean that you could not add a FallBackVideoSource to a nsTArray<nsMediaEngineVideoSource> ?

> > > +  virtual nsresult GetName(nsAString&) = 0;
> > > +  virtual nsresult GetUUID(nsAString&) = 0;
> > 
> > how is uuid useful for this API?
> 
> It's not strictly necessary, but I can imagine a case where an
> implementation might maintain a hash table of UUIDs to Objects. It's not
> necessary until you're dealing with a very large number of devices, but it
> seemed harmless to add anyway. The other case I was thinking of was where
> some devices don't have names, and you could fallback to the UUID; but
> that's probably worse than just displaying a generic name like "Default
> Camera".

I'm still unconvinced. What about removing it for now and adding it back if we really need it?
(In reply to Fabrice Desré [:fabrice] from comment #23)
> > > we should use real arrays there.
> > 
> > The pointer is a real C++ array... but,
> 
> How do you know the number of items? Add a NULL in the end?

Yes, that was the plan. I'd have used a Vector, but I wasn't sure if we were allowed to use STL.

> > I tried to make nsTArray work, but it cannot make an instance of a child
> > class while casting it to the parent. What other arrays are there that we
> > can use?
> 
> That's strange. You mean that you could not add a FallBackVideoSource to a
> nsTArray<nsMediaEngineVideoSource> ?

Yup, I tried the following:

nsTArray<nsMediaEngineVideoSource> sources;
nsMediaEngineDefaultVideoSource source;
sources.AppendElement(source);

and also with an explicit cast:

sources.AppendElement(static_cast<nsMediaEngineVideoSource>(source));

It might have something to do with the inheritance chain, the compiler error was that it could not instantiate a concrete object for nsMediaEngineVideoSource since it was an abstract class, so the statement:

nsTArray<nsMediaEngineVideoSource> sources;

always failed.

> > It's not strictly necessary, but I can imagine a case where an
> > implementation might maintain a hash table of UUIDs to Objects. It's not
> > necessary until you're dealing with a very large number of devices, but it
> > seemed harmless to add anyway. The other case I was thinking of was where
> > some devices don't have names, and you could fallback to the UUID; but
> > that's probably worse than just displaying a generic name like "Default
> > Camera".
> 
> I'm still unconvinced. What about removing it for now and adding it back if
> we really need it?

Works for me!
(In reply to Anant Narayanan [:anant] from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #21)
> > feeback+, with a couple of other comments :
> > - How do you build a stream from a video source and an audio source?
> 
> Yes, I was just writing a nsDOMMediaStreamHelper to handle this. This class
> will also let you return a MediaStream along with Allocate() that you can
> pass upward to content, and it will automatically call Start() when content
> assigns the MediaStream to something (video element, PeerConnection etc.).
> This saves us from turning on the hardware until the media stream is
> actually used.
> 
>> Should the media element be allocated when the video.src is assigned and started when user invokes, say, video.play() ..
> > Also, nit: I think we don't use the "ns" prefix anymore for new classes.
> 
> "moz" prefix, then? ;)
> 
>  
> > ::: content/media/nsMediaEngine.h
> > > +  /* Return an array of video sources. Also include devices that are
> > > +   * currently unavailable. */
> > > +  virtual nsMediaEngineVideoSource* EnumerateVideoDevices() = 0;
> > 
> > we should use real arrays there.
> 
> The pointer is a real C++ array... but,
> 
> I tried to make nsTArray work, but it cannot make an instance of a child
> class while casting it to the parent. What other arrays are there that we
> can use?
> 
> > > +  virtual nsresult GetName(nsAString&) = 0;
> > > +  virtual nsresult GetUUID(nsAString&) = 0;
> > 
> > how is uuid useful for this API?
> 
> It's not strictly necessary, but I can imagine a case where an
> implementation might maintain a hash table of UUIDs to Objects. It's not
> necessary until you're dealing with a very large number of devices, but it
> seemed harmless to add anyway. The other case I was thinking of was where
> some devices don't have names, and you could fallback to the UUID; but
> that's probably worse than just displaying a generic name like "Default
> Camera".
> 
> > > +  kVideoUYVY,
> > > +  kVideoIYUV,
> > > +  kVideoARGB,
> > > +  kVideoRGB24,
> > > +  kVideoUnknown
> > 
> > I agree with Robert. For instance the gonk engine will very likely use a
> > special image layer to benefit from the gpu.
> 
> I'll remove it, the fallback engine wasn't using it either since it was
> creating its own Image objects.
Attached patch C++ Interface - v2 (obsolete) — Splinter Review
Lots of changes in this version. The main thing that needs review in this patch is the nsDOMMediaStreamHelper class. It is essentially a proxy for nsDOMMediaStream that creates the MediaStream on first access.

We do this so we can return a MediaStream on Allocate(), but only start filling segments when Start() is called. nsDOMMediaStreamHelper will automatically call Start() when content begins using the object (currently that means assignment to a <video> element). I'm unsure if this is the right approach, or if we should just roll this functionality in to nsDOMMediaStream itself.

The other change is that we are now passing a SourceMediaStream object to Start() along with a TrackID so we can have both Audio and Video tracks in a single MediaStream, if it was requested by content.
Attachment #623571 - Attachment is obsolete: true
Attachment #624053 - Flags: review?(roc)
Attachment #624053 - Flags: review?(fabrice)
Attachment #624053 - Attachment is patch: true
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19)
> (In reply to Anant Narayanan [:anant] from comment #18)
> > I wasn't able to figure out how to make a SourceMediaStream without creating
> > a corresponding nsDOMMediaStream. Is that possible?
> 
> It's not quite trivial because currently lifetime management depends on
> nsDOMMediaStream cleaning up its MediaStream. But I'm actually working on
> fixing that myself. For now, just create the nsDOMMediaStream.

Actually I was wrong. It's entirely possible to have a MediaStream with no wrapper, just pass null for the wrapper.
(In reply to Anant Narayanan [:anant] from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #21)
> > feeback+, with a couple of other comments :
> > - How do you build a stream from a video source and an audio source?
> 
> Yes, I was just writing a nsDOMMediaStreamHelper to handle this. This class
> will also let you return a MediaStream along with Allocate() that you can
> pass upward to content, and it will automatically call Start() when content
> assigns the MediaStream to something (video element, PeerConnection etc.).
> This saves us from turning on the hardware until the media stream is
> actually used.

For now I suggest you don't do that, just turn it on. I'll give you a NotifyStartConsuming callback on MediaStreamListener you can use to trigger this properly.

> > Also, nit: I think we don't use the "ns" prefix anymore for new classes.
> 
> "moz" prefix, then? ;)

If it's in a C++ namespace, it doesn't need a prefix.

> > ::: content/media/nsMediaEngine.h
> > > +  /* Return an array of video sources. Also include devices that are
> > > +   * currently unavailable. */
> > > +  virtual nsMediaEngineVideoSource* EnumerateVideoDevices() = 0;
> > 
> > we should use real arrays there.
> 
> The pointer is a real C++ array... but,
> 
> I tried to make nsTArray work, but it cannot make an instance of a child
> class while casting it to the parent. What other arrays are there that we
> can use?

How about having the caller pass an nsTArray<nsMediaEngineVideoSource*> as a parameter, which gets filled in? That's what we usually do.

Having UUIDs for devices is a good idea. That will let apps remember prefs for specific devices without falling down when multiple devices have the same name.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #28)
> How about having the caller pass an nsTArray<nsMediaEngineVideoSource*> as a
> parameter, which gets filled in? That's what we usually do.

Actually, best to make it nsTArray<nsAutoPtr<nsMediaEngineVideoSource> > so ownership is handled properly.
Comment on attachment 624053 [details] [diff] [review]
C++ Interface - v2

Review of attachment 624053 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaEngine.h
@@ +31,5 @@
> +  virtual MediaEngineVideoSource* EnumerateVideoDevices(PRInt32&) = 0;
> +
> +  /* Return an array of audio sources. Also include devices that are
> +   * currently unavailable. */
> +  virtual MediaEngineAudioSource* EnumerateAudioDevices(PRInt32&) = 0;

Let's take the nsTArray<nsAutoPtr<>> approach

@@ +42,5 @@
> +{
> +public:
> +  virtual ~MediaEngineSource() {};
> +  
> +  virtual nsresult GetName(nsAString&) = 0;

Can this really fail? Please just return void

@@ +70,5 @@
> + */
> +enum mediaEngineVideoCodecType {
> +  kVideoCodecH263,
> +  kVideoCodecVP8,
> +  kVideoCodecI420

What is this for actually?

@@ +73,5 @@
> +  kVideoCodecVP8,
> +  kVideoCodecI420
> +};
> +
> +typedef struct mediaEngineVideoOptions {

MediaEngineVideoOptions, and don't bother with the "typedef struct". We're all C++ adults here.

@@ +74,5 @@
> +  kVideoCodecI420
> +};
> +
> +typedef struct mediaEngineVideoOptions {
> +  bool interlaced;

Why do we need this?

@@ +77,5 @@
> +typedef struct mediaEngineVideoOptions {
> +  bool interlaced;
> +  unsigned int width;
> +  unsigned int height;
> +  unsigned int maxFPS;

What is maxFPS for?

@@ +78,5 @@
> +  bool interlaced;
> +  unsigned int width;
> +  unsigned int height;
> +  unsigned int maxFPS;
> +  mediaEngineVideoCodecType codecType;

use m* field names

@@ +86,5 @@
> +{
> +public:
> +  virtual ~MediaEngineVideoSource() {};
> +  /* Fills the given struct with appropriate values for all fields. */
> +  virtual nsresult GetOptions(mediaEngineVideoOptions*) = 0;

Can this just return the struct as a direct result?

@@ +104,5 @@
> + * input from a device that has been Allocate()d but not Start()ed. When the
> + * MediaStream is actually used by content, we will call Start() on the
> + * corresponding device and start pushing frames.
> + */
> +class nsDOMMediaStreamHelper : public nsIDOMMediaStream

You should be subclassing nsDOMMediaStream.
Attached patch C++ Interface - v3 (obsolete) — Splinter Review
Updated version!

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #30)
> ::: content/media/MediaEngine.h
> > +  /* Return an array of audio sources. Also include devices that are
> > +   * currently unavailable. */
> > +  virtual MediaEngineAudioSource* EnumerateAudioDevices(PRInt32&) = 0;
> 
> Let's take the nsTArray<nsAutoPtr<>> approach

Thanks, I implemented the fallback engine with this approach and it worked pretty well!

> > +  virtual nsresult GetName(nsAString&) = 0;
> 
> Can this really fail? Please just return void

Fixed, and added GetUUID back.


> > +enum mediaEngineVideoCodecType {
> > +  kVideoCodecH263,
> > +  kVideoCodecVP8,
> > +  kVideoCodecI420
> 
> What is this for actually?

Sometimes the caller may want to pick one device over another if they advertise that they able to encode to a particular codec "natively". Most cameras just return i420, but you never know...


> > +typedef struct mediaEngineVideoOptions {
> > +  bool interlaced;
> 
> Why do we need this?

We don't, I removed it :)

> > +typedef struct mediaEngineVideoOptions {
> > +  bool interlaced;
> > +  unsigned int width;
> > +  unsigned int height;
> > +  unsigned int maxFPS;
> 
> What is maxFPS for?

Just like the codec info, the caller may prefer to Allocate() one device over another based on this value.

> > +public:
> > +  virtual ~MediaEngineVideoSource() {};
> > +  /* Fills the given struct with appropriate values for all fields. */
> > +  virtual nsresult GetOptions(mediaEngineVideoOptions*) = 0;
> 
> Can this just return the struct as a direct result?

Yes it can, fixed.

> > + * input from a device that has been Allocate()d but not Start()ed. When the
> > + * MediaStream is actually used by content, we will call Start() on the
> > + * corresponding device and start pushing frames.
> > + */
> > +class nsDOMMediaStreamHelper : public nsIDOMMediaStream
> 
> You should be subclassing nsDOMMediaStream.

I got rid of the nsDOMMediaStreamHelper class entirely. Until we have the NotifyStartConsuming callback, per comment #28. I will also make a seperate patch for adding a call to nsDOMMediaStream that will merge two streams into one, we don't need to make a whole new class just for that.
Attachment #624053 - Attachment is obsolete: true
Attachment #624053 - Flags: review?(roc)
Attachment #624053 - Flags: review?(fabrice)
Attachment #624570 - Flags: review?(roc)
Attachment #624570 - Flags: review?(fabrice)
Comment on attachment 624570 [details] [diff] [review]
C++ Interface - v3

Review of attachment 624570 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaEngine.h
@@ +28,5 @@
> +  /* Populate an array of video sources in the nsTArray. Also include devices
> +   * that are currently unavailable. */
> +  virtual nsresult EnumerateVideoDevices(
> +    nsTArray<nsAutoPtr<MediaEngineVideoSource> >*
> +  ) = 0;

uugly! Just put it on one line. Also, can this fail and can we handle the failure in a meaningful way? If not, just return void.

@@ +52,5 @@
> +  /* Populate the UUID of this device in the nsAString */
> +  virtual void GetUUID(nsAString&) = 0;
> +
> +  /* This call reserves but does not start the device. */
> +  virtual nsCOMPtr<nsDOMMediaStream> Allocate() = 0;

already_AddRefed

@@ +55,5 @@
> +  /* This call reserves but does not start the device. */
> +  virtual nsCOMPtr<nsDOMMediaStream> Allocate() = 0;
> +
> +  /* Release the device back to the system. */
> +  virtual nsresult Deallocate() = 0;

Can this just return void?

@@ +60,5 @@
> +
> +  /* Start the device and add the track to the provided SourceMediaStream, with
> +   * the provided TrackID. You may start appending data to the track
> +   * immediately after. */
> +  virtual nsresult Start(SourceMediaStream*, TrackID) = 0;

Can this just return void?

@@ +63,5 @@
> +   * immediately after. */
> +  virtual nsresult Start(SourceMediaStream*, TrackID) = 0;
> +  
> +  /* Stop the device and release the corresponding MediaStream */
> +  virtual nsresult Stop() = 0;

Can this just return void?

@@ +81,5 @@
> +
> +struct MediaEngineVideoOptions {
> +  unsigned int mWidth;
> +  unsigned int mHeight;
> +  unsigned int mMaxFPS;

Prefer explicit sizes here, so PRUint32 or uint32_t.
Comment on attachment 624570 [details] [diff] [review]
C++ Interface - v3

Review of attachment 624570 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaEngine.h
@@ +52,5 @@
> +  /* Populate the UUID of this device in the nsAString */
> +  virtual void GetUUID(nsAString&) = 0;
> +
> +  /* This call reserves but does not start the device. */
> +  virtual nsCOMPtr<nsDOMMediaStream> Allocate() = 0;

Allocate()/DeAllocate(), Start()/Stop() should contain parameter identifying the device to perform the operation on ??
Also should there be error codecs for various failures for these operations ?

@@ +60,5 @@
> +
> +  /* Start the device and add the track to the provided SourceMediaStream, with
> +   * the provided TrackID. You may start appending data to the track
> +   * immediately after. */
> +  virtual nsresult Start(SourceMediaStream*, TrackID) = 0;

Where is the ownernship for the SourceMediaStream managed ..

@@ +93,5 @@
> +  
> +  /* Return a MediaEngineVideoOptions struct with appropriate values for all
> +   * fields. */
> +  virtual MediaEngineVideoOptions GetOptions() = 0;
> +};

Should we need SetOptions()?
Attached patch C++ Interface - v3 (obsolete) — Splinter Review
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)
> ::: content/media/MediaEngine.h
> @@ +28,5 @@
> > +  /* Populate an array of video sources in the nsTArray. Also include devices
> > +   * that are currently unavailable. */
> > +  virtual nsresult EnumerateVideoDevices(
> > +    nsTArray<nsAutoPtr<MediaEngineVideoSource> >*
> > +  ) = 0;
> 
> uugly! Just put it on one line. Also, can this fail and can we handle the
> failure in a meaningful way? If not, just return void.

I wanted to make it fit with the 80-char limit, but looks like we don't have that rule anymore; moved it to fit in one line.

Also changed the return type to void, even though the call can fail the caller can't do much about it.


> > +  /* Release the device back to the system. */
> > +  virtual nsresult Deallocate() = 0;
> > +  virtual nsresult Start(SourceMediaStream*, TrackID) = 0;
> > +  virtual nsresult Stop() = 0;
> 
> Can this just return void?

All of these calls can fail, if they are called out of order (i.e. you can't call Deallocate() without calling Stop() first, etc). These cases can be handled by logging and returning void, I suppose, but I don't see the harm in returning an nsresult to stay consistent with the rest of the family.

In the case of Start(), the return value is actionable by the caller.

(In reply to Suhas from comment #33)
> > +  /* Populate the UUID of this device in the nsAString */
> > +  virtual void GetUUID(nsAString&) = 0;
> > +
> > +  /* This call reserves but does not start the device. */
> > +  virtual nsCOMPtr<nsDOMMediaStream> Allocate() = 0;
> 
> Allocate()/DeAllocate(), Start()/Stop() should contain parameter identifying
> the device to perform the operation on ??

The methods are on a MediaEngineSource object, which identifies the device.

> Also should there be error codecs for various failures for these operations ?

For Allocate and Start, it might make sense to have error codes; so we can inform the user if the call failed because the device is in use by another program; or because of hardware failure. Let's start with the simple case now, and add those error codes when the DOM bindings get fleshed out a bit more.

> @@ +60,5 @@
> > +
> > +  /* Start the device and add the track to the provided SourceMediaStream, with
> > +   * the provided TrackID. You may start appending data to the track
> > +   * immediately after. */
> > +  virtual nsresult Start(SourceMediaStream*, TrackID) = 0;
> 
> Where is the ownernship for the SourceMediaStream managed ..

SourceMediaStreams are tightly coupled with nsDOMMediaStream objects as I understand. In most cases, the caller will simply pass a SourceMediaStream that it obtained from a nsDOMMediaStream with the preceding Allocate() call. The reason this is not just an nsDOMMediaStream is to allow cases where two nsDOMMediaStreams need to be mixed into one, we can change this when we add this functionality there.

If an nsDOMMediaStream is released, associated SourceMediaStreams should be released too. Is that correct, roc?
 
> @@ +93,5 @@
> > +  
> > +  /* Return a MediaEngineVideoOptions struct with appropriate values for all
> > +   * fields. */
> > +  virtual MediaEngineVideoOptions GetOptions() = 0;
> > +};
> 
> Should we need SetOptions()?

We might at a later point when we implement the constraints API. But, for now, each hardware device can have more than one MediaEngineSource representing it, so the caller can simply choose the one closest to what it wants by using GetOptions() on all the objects returned by EnumerateVideoDevices.
Attachment #624570 - Attachment is obsolete: true
Attachment #624570 - Flags: review?(roc)
Attachment #624570 - Flags: review?(fabrice)
Attachment #624807 - Flags: review?(roc)
(In reply to Anant Narayanan [:anant] from comment #31)

> > > +enum mediaEngineVideoCodecType {
> > > +  kVideoCodecH263,
> > > +  kVideoCodecVP8,
> > > +  kVideoCodecI420
> > 
> > What is this for actually?
> 
> Sometimes the caller may want to pick one device over another if they
> advertise that they able to encode to a particular codec "natively". Most
> cameras just return i420, but you never know...

Right - for example, the Logitech 915 and 920 webcams have built-in H.264 encoders, and the webrtc code is at least in theory set up to handle this.  Of course, I'd prefer built-in VP8, but the point is we need to expose it and key on it.
(In reply to Anant Narayanan [:anant] from comment #34)
> I wanted to make it fit with the 80-char limit, but looks like we don't have
> that rule anymore; moved it to fit in one line.

We do have an 80-column rule, but in extreme cases like this one, just break it.

> All of these calls can fail, if they are called out of order (i.e. you can't
> call Deallocate() without calling Stop() first, etc). These cases can be
> handled by logging and returning void, I suppose, but I don't see the harm
> in returning an nsresult to stay consistent with the rest of the family.
> 
> In the case of Start(), the return value is actionable by the caller.

OK.
Attached file C++ Interface - v3.1 (obsolete) —
This minor tweak to the interface changes the nsTArray passed to the enumerate functions to contain nsRefPtrs instead of nsAutoPtrs. This makes it slightly easier to write backends (as most of them will use timers referring to themselves as callbacks), at the cost of doing some refcounting.
We're going to support still image capture via the {picture:true} argument to getUserMedia. For this purpose, MediaEngine will need an additional interface by which the DOM code can request backends to provide a picture.

The immediate need is from bug 738528.
Attached patch C++ Interface - v3.2 (obsolete) — Splinter Review
We made two changes since this was last reviewed, so flagging again. Changelog:

- Made the array of sources passed to Enumerate use RefPtrs instead of AutoPts.
- Added a Snapshot() method to allow taking still pictures from video sources (and potentially snippets of audio from audio source).
Attachment #624807 - Attachment is obsolete: true
Attachment #626122 - Attachment is obsolete: true
Attachment #629035 - Flags: review?(roc)
Comment on attachment 629035 [details] [diff] [review]
C++ Interface - v3.2

Review of attachment 629035 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaEngine.h
@@ +62,5 @@
> +  /* Take a snapshot from this source. In the case of video this is a single
> +   * image, and for audio, it is a snippet lasting aDuration milliseconds. The
> +   * duration argument is ignored for a MediaEngineVideoSource.
> +   */
> +  virtual nsresult Snapshot(PRUint32 aDuration, nsILocalFile** aFile) = 0;

Why do you want to return a file here?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #40)
> > +  /* Take a snapshot from this source. In the case of video this is a single
> > +   * image, and for audio, it is a snippet lasting aDuration milliseconds. The
> > +   * duration argument is ignored for a MediaEngineVideoSource.
> > +   */
> > +  virtual nsresult Snapshot(PRUint32 aDuration, nsILocalFile** aFile) = 0;
> 
> Why do you want to return a file here?

This maps fairly directly to the JS-DOM API which receives a Blob in its callback. Using a file let's us not worry about the format the backend decides to return the data in. We can later introspect if we got RAW, PNG; or WAV, OGG.

It's also simpler, since snapshots are self contained and "static", it seemed like the best way to represent them.
(In reply to Anant Narayanan [:anant] from comment #41)
> This maps fairly directly to the JS-DOM API which receives a Blob in its
> callback. Using a file let's us not worry about the format the backend
> decides to return the data in. We can later introspect if we got RAW, PNG;
> or WAV, OGG.
> 
> It's also simpler, since snapshots are self contained and "static", it
> seemed like the best way to represent them.

But why would you want to do I/O?

And files aren't typed, so it seems to me you're losing information since there's no transmission of the type from the backend to other code. Sure, we can guess based on the file extension, but that seems wrong.

I'm not quite sure of the right thing to do here. Could we just return a byte array and a type string?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #42)

> I'm not quite sure of the right thing to do here. Could we just return a
> byte array and a type string?

or a  DOMBlob? (or a DOMFile, that could be memory based, see https://mxr.mozilla.org/mozilla-central/source/content/base/public/nsDOMFile.h#255)
We actually do convert the nsILocalFile into a DOMFile later on, in the DOM binding:
https://github.com/anantn/gum/blob/master/patches/media_dom#L444

We can certainly move that conversion lower down (for Android), while leaving this interface be more generic for other backends. Looks like it has a contentType attribute too.
Use an nsIDOMFile instead of a nsILocalFile.
Attachment #629035 - Attachment is obsolete: true
Attachment #629035 - Flags: review?(roc)
Attachment #629100 - Flags: review?(roc)
blocking-basecamp: --- → ?
blocking-kilimanjaro: --- → ?
Comment on attachment 629100 [details] [diff] [review]
Fallback Media Engine - v3.3

Review of attachment 629100 [details] [diff] [review]:
-----------------------------------------------------------------

Roc r+'d this before, and the changes are again very small and at roc's request.  r+ = jesup, but leaving roc on just in case he wants to check
Attachment #629100 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/b08466d9d710
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Comment on attachment 629100 [details] [diff] [review]
Fallback Media Engine - v3.3

Bug landed, clearing flag.
Attachment #629100 - Flags: review?(roc)
blocking-basecamp: ? → -
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.