Closed Bug 749886 Opened 9 years ago Closed 3 years ago

B2G still image support for getUserMedia (getUserMedia({picture:true}))

Categories

(Core :: DOM: Device Interfaces, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
blocking-kilimanjaro +
blocking-basecamp -

People

(Reporter: mreavy, Assigned: mikeh)

References

Details

Need an implementation for B2G similar to the implementation for Android in Bug 738528.

This is based on the proposal here: http://lists.w3.org/Archives/Public/public-media-capture/2012Mar/0008.html
Blocks: 749875
That proposal is overly simplistic: it doesn't allow content to twiddle knobs like auto-focus, focus area, white balance, ISO speed, etc.  That prevents (privileged?) content from creating alternative photo apps that do things like WebGL filters etc., leaving users with whatever the browser chrome has decided to implement.

There's another problem here, which is that b2g doesn't have browser chrome :).
The API proposed above isn't targeted at privileged camera replacement apps. For Camera apps, you'll need MediaStreams anyway, as you may like to perform processing on the live preview of the camera, which isn't possible without a full fledged getUserMedia implementation.

The still image API is strictly for untrusted web content looking get a quick picture off the device - examples include facebook profile picture uploads, and instagram like apps. This API should ideally work in the B2G browser app, and B2G apps that choose to use the API.

I have a few ideas as to what the privileged API might look like, we've been discussing constraints & capabilities in the W3C group which allows enumeration of devices and the kinds of things you'd like to do in a camera app. However, I think we should discuss that API in another bug dedicated to bringing the full getUserMedia to B2G.

I should add that the reason for splitting functionality into different rings this way is to simply aid implementation. We can start with the small and easy low hanging fruit and then work our way up towards the more complicated APIs.
My plan for b2g is as follows:

- still image API (and <input accept=image/*) will be implemented by firing an activity to ask the camera app to send it back a picture. Some details need to be defined, like how to pass a Blob around (one option is to use the MediaStorage api)

- the camera app needs at least live preview with MediaStreams, and some finer control (autofocus, full resolution still capture, etc). We'll need an api somewhat like https://wiki.mozilla.org/WebAPI/CameraControl, and expose that on the MediaStream itself or on the <video> element when it plays a MediaStream.
roc's proposal (which I like) was to have a stream subclass that exposed the (mostly) still-image-specific knobs like focus-ready, flash, etc.
I'm wondering if it really makes sense to subclass a stream to provide all of those still-camera frobs.  To me, it seems like a handset/computer would have one or more Camera objects, some of which may expose methods like getViewfinderMediaStream() and getFullMediaStream(), in addition to getStillImage(); and methods like setFocusMode() or setZoom() will affect all of those simultaneously.
A "camera stream" is probably not the way to go since getUserMedia can return a single stream with separate tracks for different cameras and microphones. You may instead want to put camera-specific APIs on the tracks.

Do any devices that we care about actually have separate CCDs for the viewfinder? I don't think we need separate video tracks for the same camera.
I've been assuming the privileged still-image consumer would be written something like the following

  <video id="preview"/>
  <input id="whitebalance" onchange="setISO(this.value)">
  <!-- ... -->
  <button onclick="snapshot()">Click</button>

  var cameraStream = navigator.getUserMedia('still-camera');  // whatever
  preview.src = cameraStream;

  function setISO(value) {
    cameraStream.iso = value;
  }
  //...

  function snapshot() {
    var req = cameraStream.focusAndCapture();  // whatever
    req.onsuccess = function() {
      // do something with req.result, a Blob at full-res
    }
  }

Does this approximately match up with what other folks have in mind?  If not, how not?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> 
> Does this approximately match up with what other folks have in mind?  If
> not, how not?

That matches my thoughts, yes.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
>
> A "camera stream" is probably not the way to go since getUserMedia can return a
> single stream with separate tracks for different cameras and microphones. You may > instead want to put camera-specific APIs on the tracks.
>
> Do any devices that we care about actually have separate CCDs for the
> viewfinder? I don't think we need separate video tracks for the same camera.

From looking through the underlying camera library, it looks like the driver can provide simultaneous low-res viewfinder and high-res capture streams, in which case multiple video tracks seems like the way to go.
Blocks: 749757
No longer blocks: 749875
(In reply to Mike Habicher [:mikeh] from comment #9)
> From looking through the underlying camera library, it looks like the driver
> can provide simultaneous low-res viewfinder and high-res capture streams, in
> which case multiple video tracks seems like the way to go.

I assume that the low-res and high-res streams are to allow apps to save power by only grabbing the low-res stream? In that case we probably need to be able to request separate low-res and high-res MediaStreams instead of making them tracks of the same stream, so we're not always capturing both.

(In reply to Chris Jones [:cjones] [:warhammer] from comment #7)
> I've been assuming the privileged still-image consumer would be written
> something like the following

Looks reasonable.

I suppose we could add a method to the still-camera/viewfinder stream that returns a MediaStream containing the full-res stream, so apps can switch from the low-res preview to a high-res preview without re-requesting getUserMedia --- if any want to.
blocking-kilimanjaro: --- → +
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> (In reply to Mike Habicher [:mikeh] from comment #9)
> > From looking through the underlying camera library, it looks like the driver
> > can provide simultaneous low-res viewfinder and high-res capture streams, in
> > which case multiple video tracks seems like the way to go.
> 
> I assume that the low-res and high-res streams are to allow apps to save
> power by only grabbing the low-res stream? In that case we probably need to
> be able to request separate low-res and high-res MediaStreams instead of
> making them tracks of the same stream, so we're not always capturing both.

I think it's because the camera module can generate the low-res preview itself using a dedicated hardware engine, in parallel with the high-res stream, thereby saving CPU cycles on the main processor.
blocking-kilimanjaro: + → ---
We can't expect to always be able to read either the full-res ("capture", I guess would be a better name) *or* the preview stream.  At least with some video decoder hardware, most likely encoders too.

And yes, that will result in broken CSS semantics in general.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #10)
> I suppose we could add a method to the still-camera/viewfinder stream that
> returns a MediaStream containing the full-res stream, so apps can switch
> from the low-res preview to a high-res preview without re-requesting
> getUserMedia --- if any want to.

I was kinda hoping we could do all this magically in Gecko based on other stream state.
Choose based on the resolution of the stream sink's video, or automatically switch when capture / auto-focus starts.  For example.
That gets hairy. What if it's just being streamed out over WebRTC or used in some other situation where the desired size is not obvious? I'd rather let the author control it.
If it's being streamed out, why do you think the author is in a better position to decide than Gecko?

If the target size isn't obvious, gecko can arbitrarily pick and authors have to deal, like CSS default sizes.  It should be easy for them to hint at desired size, like through video resolution.  I still think gecko is probably in a better position to make that decision when size/res is unspecified.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #17)
> If it's being streamed out, why do you think the author is in a better
> position to decide than Gecko?

Because the author probably has some idea of what the video will be used for and what's at the other end.

> If the target size isn't obvious, gecko can arbitrarily pick and authors
> have to deal, like CSS default sizes.  It should be easy for them to hint at
> desired size, like through video resolution.  I still think gecko is
> probably in a better position to make that decision when size/res is
> unspecified.

We could do it that way. We'll have to propagate signals backwards through the media stream graph. But we will definitely need a way for authors to express resolution requests and switch them on the fly. I guess that could just be another method on the camera stream.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #17)
> > If it's being streamed out, why do you think the author is in a better
> > position to decide than Gecko?
> 
> Because the author probably has some idea of what the video will be used for
> and what's at the other end.
> 

Yes, but Gecko also knows the instantaneous network throughput and so forth.  (If that's already covered by something higher level, sorry, I'm not overly familiar with WebRTC.)
blocking-basecamp: --- → ?
blocking-kilimanjaro: --- → ?
Required for basic camera applications that are not trusted.
blocking-basecamp: ? → +
blocking-kilimanjaro: ? → +
Assignee: nobody → fabrice
Assignee: fabrice → mhabicher
(In reply to Dietrich Ayala (:dietrich) from comment #20)
> Required for basic camera applications that are not trusted.

Per discussions on email threads, I'm putting this back in the nomination queue for analysis. I really need to understand better why this blocks, as I am not sure if it really needs to.

Note - Support for untrusted camera applications probably should not be a blocker for basecamp (that's part of the building the global marketplace P2 objective). The real determination if this blocks or not is if gaia's camera app needs this implementation to exist or not.
blocking-basecamp: + → ?
Mike H's response for context btw:

"getUserMedia() support is _not_ required for the B2G camera to function.  We have implemented a low-level camera control API (which landed last night: https://hg.mozilla.org/integration/mozilla-inbound/rev/d59f932deea9 and https://hg.mozilla.org/integration/mozilla-inbound/rev/b13039139d7b ) that exposes detailed features of the camera, beyond gUM().  The intent is that gUM() can evolve as per the WebRTC requirements, and camera control can change as we need it to for the B2G camera."
Whiteboard: [blocked-on-input]
should be blocking-basecamp -'d. 

;TLDR -- Latest discussion via email between jonas and anant indicates this feature should be punted to v.next for b2g.


On 08/01/2012 05:10 PM, Jonas Sicking wrote:
The Camera Control API that we have in B2G was added there because we
needed an API *now* which we could use to implement the camera app.
I'd much rather keep it at that, and as a source of feedback, and
instead let getUserMedia evolve uninhibited until it supports the
requirements for implementing a camera app.

That seems reasonable. My motivations were two-fold:

1. To be able to point developers at getUserMedia when asked the question: "how do I make an alternate camera app?", instead of them having to look at the source of our camera app to find CameraControl instead.

2. To broaden usage of getUserMedia({picture:true}), which is already implemented on Desktop & Android and get some more feedback.

It sounds like it is too late in the game to do (1) any more, and that we will have to go ahead and retain the CameraControl API. I will continue efforts in parallel to add whatever features are necessary to build a camera app using getUserMedia, and we can hopefully switch over when it's ready.

As for (2), simply adding syntactic sugar to the already existing web activity should be more than sufficient. This bug https://bugzilla.mozilla.org/show_bug.cgi?id=749886 covers that feature, do let me know if there's anything we can do to move that along.

Thanks,
-Anant
Whiteboard: [blocked-on-input]
Tony, bug 752352 can be punted for now, but we'd still like to get this bug in for v1 of B2G - it covers point (2) described above - a getUserMedia based syntax for the web activity to capture a still picture.
(In reply to Anant Narayanan [:anant] from comment #24)
> Tony, bug 752352 can be punted for now, but we'd still like to get this bug
> in for v1 of B2G - it covers point (2) described above - a getUserMedia
> based syntax for the web activity to capture a still picture.

Right (I don't think anyone right now is disagreeing that we shouldn't do this). Let me try to be direct on this:

Would we block ship on the version one of the phone if this wasn't implemented?

The target feature in question involves the web activity for capturing the picture (if I'm correct). I have a question based on that discussion:

Is the web activity support implemented right now (meaning, could a browser technically make use of it)? And the goal of this is to provide a more common API that increase more wide use of the activity itself and getUserMedia?
Summary: B2G still image support for getUserMedia → B2G still image support for getUserMedia (getUserMedia({picture:true}))
Marking this blocking- per product saying that this is not a v1 product requirement.

We already have support for the Camera Control API which we implemented because getUserMedia couldn't solve all the use cases that we had (you can't implement a camera app using getUserMedia currently).

We also didn't want to expand getUserMedia to the point of being able to support the use cases that we had since it would require pretty big changes to getUserMedia and we didn't want to get in the way of future standardized developments of getUserMedia.

That said, despite this being blocking- doesn't mean that we can't do it. Once we have a WebActivity for getting a picture, it might be very low cost and high value to implement getUserMedia({picture}) on top of that WebActivity.
blocking-basecamp: ? → -
FxOS/Gonk has been removed from the codebase. Mass-invalidating FxOS related Device Interface bugs to clean up the component. 

If I incorrectly invalidated something, please let me know.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
Bulk correction of resolution of B2G bugs to INCOMPLETE.
Resolution: INVALID → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.