[Madai][Camera][Gecko] Implement face-tracking API

RESOLVED FIXED in 1.4 S3 (14mar)

Status

defect
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: mikeh, Assigned: mikeh)

Tracking

({dev-doc-needed})

unspecified
1.4 S3 (14mar)
ARM
Gonk (Firefox OS)
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: PR ETA 3/5)

Attachments

(1 attachment, 15 obsolete attachments)

78.46 KB, patch
mikeh
: review+
Details | Diff | Splinter Review
Expose the camera face-tracking API to the DOM:
- capability-supported attribute
- enabled/disabled attribute
- callback with identified face information
I've already finished this implementation and then I'm starting testing it now to verify if the detected coordinates are correct.
I hope to upload my patch by this week.
However, as I am also currently implementing other features (APIs for ISO, OIS, and HDR)
It might be delayed to the next week.
Anyway, my due date is this week.
Assignee: mhabicher → hiro7998
Status: NEW → ASSIGNED
blocking-b2g: 1.4? → ---
Summary: [Camera][Gecko] Implement face-tracking API → [Madai][Camera][Gecko] Implement face-tracking API
Blocks: 971211
Youngwoo, 

Please provide the details of this implementation. From comment 1, it sounds like you may already have a changes ready for feedback from Mike.

Thanks
Hema
Flags: needinfo?(hiro7998)
I've not yet completed it. so I might not be able to share the codes.
So just I can share the interface of the APIs only.

CameraManager.webidl
  // Information of the detected face
  // The name, 'Face', is so general. should it be more camera-specific?
  dictionary Face {
    long left;
    long top;
    long right;
    long bottom;
    long score;
  }

CameraControl.webidl
  // Callback to notify the result as a sequence of Face and the number of the detected faces.
  // When starting the face-detection, this callback is passed to the camera module.
  callback CameraFaceDetectionCallback = void (sequence<Face> faces, long numFaces)

  interface CameraControl {
    // Make the face-detection start
    startFaceDetection(CameraFaceDetectionCallback aOnSuccess,
                       optional CameraErrorCallback aOnError)
    // Make the face-detection stop
    stopFaceDetection()
  }

CameraCapabilities.webidl
  CameraCapabilities {
    // If maxDetectedFaces is 0, face-detection is N/A feature in the device
    [Constant, Cached] readonly attribute unsigned long maxDetectedFaces;
  }  

If Mike gives your comments, it'll be very helpful to us.
The ambiguous things are the followings

1. Type of 'Face' : this includes info of one detected face, 
  => Its type is dictionary, is it OK?
2. Name of 'Face'
  => Does it need more camera-specific name like CameraFace or CameraDetectedFace?
3. The position to register a result callback
  => Passed as a parameter of startFaceDetection()
  => Is it better to be passed by a separate API like setFaceDetectionCallback?
This looks good. Specific feedback below.

(In reply to leo.bugzilla.gecko from comment #3)
>
> I've not yet completed it. so I might not be able to share the codes.
> So just I can share the interface of the APIs only.
> 
> CameraManager.webidl
>   // Information of the detected face
>   // The name, 'Face', is so general. should it be more camera-specific?
>   dictionary Face {
>     long left;
>     long top;
>     long right;
>     long bottom;
>     long score;
>   }

Please rename this to CameraFace.

Also, unless this dictionary is required to get the initial CameraControl object, it should be moved to CameraControl.webidl. Please change 'score' to 'weight' and if this is an unsigned value (i.e. 0..1000) please change its type to 'unsigned long'.

> CameraControl.webidl
>   // Callback to notify the result as a sequence of Face and the number of
> the detected faces.
>   // When starting the face-detection, this callback is passed to the camera
> module.
>   callback CameraFaceDetectionCallback = void (sequence<Face> faces, long
> numFaces)

Sequences should have a length property already; there is no need to return numFaces as a separate value.

>   interface CameraControl {
>     // Make the face-detection start
>     startFaceDetection(CameraFaceDetectionCallback aOnSuccess,
>                        optional CameraErrorCallback aOnError)
>     // Make the face-detection stop
>     stopFaceDetection()
>   }

The function callbacks are used to provide a one-shot result, in this case, whether or not we succeeded in starting the face detection process. After that, the face detection event callback can be called periodically to provide updates as needed. So:

callback CameraStartFaceDetectionCallback = void ();
callback CameraFaceDetectedCallback = void (sequence<CameraFace> faces);

interface CameraControl {
  startFaceDetection(optional CameraStartFaceDetectionCallback onSuccess, optional CameraErrorCallback onError);
  stopFaceDetection();
  attribute CameraFaceDetectedCallback? onFaceDetected;
} 

> CameraCapabilities.webidl
>   CameraCapabilities {
>     // If maxDetectedFaces is 0, face-detection is N/A feature in the device
>     [Constant, Cached] readonly attribute unsigned long maxDetectedFaces;
>   }  

Change "N/A" to "not supported".

> If Mike gives your comments, it'll be very helpful to us.
> The ambiguous things are the followings
> 
> 1. Type of 'Face' : this includes info of one detected face, 
>   => Its type is dictionary, is it OK?

See above.

> 2. Name of 'Face'
>   => Does it need more camera-specific name like CameraFace or
> CameraDetectedFace?

As noted above: CameraFace.

> 3. The position to register a result callback
>   => Passed as a parameter of startFaceDetection()
>   => Is it better to be passed by a separate API like
> setFaceDetectionCallback?

See above.
The below APIs are added.

interface CameraCapabilities
{
  [Constant, Cached] readonly attribute unsigned long maxDetectedFaces;
};
dictionary CameraPoint
{
  long x = 0;
  long y = 0;
};
interface CameraDetectedFace
{
  readonly attribute long id;
  readonly attribute long score;
  readonly attribute any  rect;
  readonly attribute any  leftEye;
  readonly attribute any  rightEye;
  readonly attribute any  mouth;
};
callback CameraFaceDetectionCallback = void (sequence<CameraDetectedFace> faces);
interface CameraControl
{
  StartFaceDetection();
  StopFaceDetection();
  attribute CameraFaceDetectionCallback? onFaceDetected;
};
Flags: needinfo?(hiro7998)
CameraDetectedFace is implemented as an interface type.
But I know it's better to be implemented as a dictionary type than an interface type.
So I tried to make it as a dictionary type, however, it caused a crash.
OnFaceDetected() function implemented in nsDOMCameraControl.cpp does not have a JSContext parameter.
So I tried to get a JSContext from the owner window(nsPIDOMWindow) and I succeed to get a JSContext
like that

  JSContext *cx;
  if (aWindow) {
    nsCOMPtr<nsIScriptGlobalObject> sgo(do_QueryInterface(aWindow));
    if (sgo) {
      cx = sgo->GetContext()->GetNativeContext();
    }
  }

But, while allocating a JSArray for an array of the faces using JS_NewArrayObject(), a crash occurred.
I think it's caused by a JSContext.
Can you help me to convert the type of CameraDetectedFace to dictionary?
Flags: needinfo?(mhabicher)
Comment on attachment 8380632 [details] [diff] [review]
Camera-Face-tracking-API.patch

Review of attachment 8380632 [details] [diff] [review]:
-----------------------------------------------------------------

This is an excellent first patch. Everything looks good except for the new interface where I think a dictionary should be sufficient. Let me look into that.

I'm not ready to give this a formal r+ or r- yet, but here is some feedback from my first pass through the code. Well done.

::: dom/bindings/Bindings.conf
@@ +193,5 @@
>      }
>  },
>  
> +'CameraDetectedFace': {
> +    'nativeType': 'mozilla::nsDOMCameraDetectedFace',

Is the 'mozilla' namespace prefix required?

::: dom/camera/CameraControlImpl.cpp
@@ +121,4 @@
>  }
>  
>  void
> +CameraControlImpl::OnFaceDetected(nsTArray<Face>& aFaces)

This should be 'const nsTArray<Face>& aFaces'.

::: dom/camera/DOMCameraControl.cpp
@@ +1294,5 @@
> +  MOZ_ASSERT(NS_IsMainThread());
> +
> +  nsCOMPtr<CameraFaceDetectionCallback> cb = mOnFaceDetectedCb;
> +  if (cb) {
> +    mozilla::dom::Sequence<mozilla::dom::OwningNonNull<nsDOMCameraDetectedFace> > faces;

The 'mozilla::dom' namespace prefix shouldn't be required here.

@@ +1297,5 @@
> +  if (cb) {
> +    mozilla::dom::Sequence<mozilla::dom::OwningNonNull<nsDOMCameraDetectedFace> > faces;
> +    uint32_t len = aFaces.Length();
> +    NS_ENSURE_TRUE_VOID(faces.SetCapacity(len));
> +    for (uint32_t i=0; i < len; i++) {

whitespace nit: "uint32_t i = 0"

::: dom/camera/DOMCameraControlListener.cpp
@@ +223,5 @@
> +{
> +  class Callback : public DOMCallback
> +  {
> +  public:
> +    Callback(nsMainThreadPtrHandle<nsDOMCameraControl> aDOMCameraControl, nsTArray<ICameraControl::Face>& aFaces)

I think the aFaces array should be const.

::: dom/camera/GonkCameraControl.cpp
@@ +1447,5 @@
>  
>  void
> +OnFaceDetected(nsGonkCameraControl* gc, camera_frame_metadata_t* aMetaData)
> +{
> +  DOM_CAMERA_LOGI("Camera face detected");

To increase the value of this log entry, we should at least log the number of faces around. Also, since this can be plural, "face" --> "face(s)"

@@ +1454,5 @@
> +  if (aMetaData) {
> +    uint32_t numFaces = aMetaData->number_of_faces;
> +    if (numFaces) {
> +      faces.SetCapacity(numFaces);
> +      ICameraControl::Face* r;

'r' for the faces looks a little funny. Make this local variable 'f' or 'face'.

@@ +1455,5 @@
> +    uint32_t numFaces = aMetaData->number_of_faces;
> +    if (numFaces) {
> +      faces.SetCapacity(numFaces);
> +      ICameraControl::Face* r;
> +      for (uint32_t i=0; i < numFaces; i++) {

whitespace nit: "uint32_t i = 0"

@@ +1478,5 @@
> +                         r->region.left, r->region.top, r->region.right, r->region.bottom,
> +                         r->leftEye.x, r->leftEye.y, r->rightEye.x, r->rightEye.y, r->mouth.x, r->mouth.y);
> +      }
> +    }
> +  }

All of the above translation code should be moved into nsGonkCameraControl::OnFaceDetected(). This wrappers are meant to be minimal, and will probably go away in the future.

void
OnFaceDetected(nsGonkCameraControl* gc, camera_frame_metadata_t* aMetaData)
{
  gc->OnFaceDetected(aMetaData);
}

::: dom/camera/GonkCameraHwMgr.cpp
@@ +284,5 @@
> +  int rv = OK;
> +
> +  rv = mCamera->sendCommand(CAMERA_CMD_START_FACE_DETECTION, CAMERA_FACE_DETECTION_HW, 0);
> +  if (rv != OK) {
> +    DOM_CAMERA_LOGE("mHardware->StartFaceDetection() failed with status %d", rv);

There is no 'mHardware'; you can just make this "StartFaceDetection() failed with status %d".

@@ +296,5 @@
> +{
> +  DOM_CAMERA_LOGI("%s\n", __func__);
> +  int rv = OK;
> +
> +

Please remove this extra blank line.

@@ +299,5 @@
> +
> +
> +  rv = mCamera->sendCommand(CAMERA_CMD_STOP_FACE_DETECTION, 0, 0);
> +  if (rv != OK) {
> +    DOM_CAMERA_LOGE("mHardware->StopFaceDetection() failed with status %d", rv);

There is no 'mHardware'; you can just make this "StopFaceDetection() failed with status %d".

::: dom/camera/ICameraControl.h
@@ +125,5 @@
> +  };
> +
> +  struct Face {
> +    int32_t id;
> +    int32_t score;

My comment in CameraControl.webidl about 'score' vs 'region.weight' applies here.

::: dom/webidl/CameraControl.webidl
@@ +133,5 @@
> +
> +/* The information of the each face detected by a camera device, e.g.
> +     {
> +       id: 1,
> +       score: 80,

I would like to remove 'score' and just include it as 'weight' in 'rect' items. That way, when applying the detected faces to the autofocus areas, we can just do:
  faceDetectCallback: function(faces) {
    CameraControl.autoFocusAreas = faces.rect;
  }

@@ +166,5 @@
> +   'rightEye' is the coordinates of the right eye. Description is same as 'leftEye'.
> +
> +   'mouth' is the coordinates of the mouth. Description is same as 'leftEye'.
> +*/
> +interface CameraDetectedFace

I think there should be a way to handle this as a dictionary. I will investigate.
Attachment #8380632 - Flags: feedback+
Comment on attachment 8380632 [details] [diff] [review]
Camera-Face-tracking-API.patch

Review of attachment 8380632 [details] [diff] [review]:
-----------------------------------------------------------------

Two more things:
1. If we decide to stick with detected faces as an interface, the interface will need to be added to dom/tests/mochitest/general/test_interfaces.html.
2. Regardless, before this can land it will need an automated test. Since I doubt the emulator supports face detection, we can capture Start/StopFaceDetection() in TestGonkCameraHardware.cpp/.h and inject fake callback events.
What do you think of the following?

dictionary CameraDetectedFace
{
  unsigned long         id = 0;
  CameraRegion          rect;
  sequence<CameraPoint> eyes;
  CameraPoint           mouth;
};

1. I think 'id' is meant to be unsigned.
2. 'rect' includes the 'score' member in its 'weight' value for the reason I mentioned in comment 7.
3. 'eyes' may have 0, 1, or 2 (or more!) items, but I assume it's not important to the UX to differentiate between left and right eyes.

The C++ for this looks like:

struct CameraDetectedFace : public DictionaryBase
{
  Optional<Sequence<CameraPoint > > mEyes;
  uint32_t mId;
  CameraPoint mMouth;
  CameraRegion mRect;

  // other members...
};

The callback signature in C++ looks like this:

  void Call(const Sequence<CameraDetectedFace >& faces, ErrorResult& aRv, ExceptionHandling aExceptionHandling = eReportExceptions);

The only drawback here is that 'mouth' is always present, even if the camera doesn't detect one. To address that with dictionaries, 'mouth' has to be of type 'any', which complicates things slightly. In that case, if we have no mouths detected, we set the member to JS::Undefined(); and if we do, then we can call ToObject() on the CameraPoint object:

  CameraDetectedFace face;
  AutoJSContext cx;
  if (mouthPoint.ToObject(cx, JS::NullPtr(), &face.mMouth)) {
    // success
  }

This minimizes the amount of manual JS object construction we need to do.
Flags: needinfo?(mhabicher) → needinfo?(hiro7998)
(In reply to Mike Habicher [:mikeh] from comment #7)

I've all modified the patch according to your comments. After I try to change CameraControl's type to a dictionary, I will upload a new patch.

> ::: dom/webidl/CameraControl.webidl
> @@ +133,5 @@
> > +
> > +/* The information of the each face detected by a camera device, e.g.
> > +     {
> > +       id: 1,
> > +       score: 80,
> 
> I would like to remove 'score' and just include it as 'weight' in 'rect'
> items. That way, when applying the detected faces to the autofocus areas, we
> can just do:
>   faceDetectCallback: function(faces) {
>     CameraControl.autoFocusAreas = faces.rect;
>   }
I think the above is a good idea, however I think the meanings of 'weight' and 'score' are different.
'weight' is a weight of unit of the coordinates but 'score' is the confidence level of the faces detected by a camera. So I think it's better for them to be separated. Could you consider it again please?

> > +interface CameraDetectedFace
> 
> I think there should be a way to handle this as a dictionary. I will
> investigate.

Cause I've another works, I've not tried it. so I will try it again today.
Flags: needinfo?(hiro7998)
I changed the type of 'CameraDetectedFace from interface to dictionary.
As a result, It became more simple.
And I modified almost things by Mikeh's comment #7.
However, because I have several things different with Mikeh's or ambiguous, I did not apply them to this patch.
They are the followings.
1. Using 'weight' instead of 'score'
2. Nullable attributes of left/right eyes and mouth

About 'weight' and 'score', I think they have different meanings.
They are similar but apparently different.
I confirmed it from the camera driver developers but it seems that the camera app developers have more knowledge about this. I will try it and then I will update it.

About the nullable attributes for two eyes and mouth.
According to http://androidxref.com/4.0.3_r1/xref/system/core/include/system/camera.h#233
 - rect and score is always supported
 - id is 0, if not supported
 - when left_eye, right_eye, and mouth are not supported, the values are not specified.
'mouth' and the eyes always are passed to app-side regardless of the detection.
App should check if the result values are meaningful or not. So we have only to pass all the values to app-side.
However, according to http://developer.android.com/reference/android/hardware/Camera.Face.html#leftEye.
Mouth and eyes can be null, if not supported.
I'm not sure how it's possible. But I expect it's dependent on each camera library. So currently I can not decide how I handle this.

And the test scripts and adding the fake-event code in TestGonkCameraHardware.cpp are not yet finished.
I will upload it soon.
Attachment #8380632 - Attachment is obsolete: true
(In reply to Youngwoo Jo from comment #11)
> 
> About the nullable attributes for two eyes and mouth.
> According to
> http://androidxref.com/4.0.3_r1/xref/system/core/include/system/camera.h#233
>  - rect and score is always supported
>  - id is 0, if not supported
>  - when left_eye, right_eye, and mouth are not supported, the values are not
> specified.
> 'mouth' and the eyes always are passed to app-side regardless of the
> detection.
> App should check if the result values are meaningful or not. So we have only
> to pass all the values to app-side.

From the above description, it sounds like there is a layer that is taking the missing values and creating "invalid" values to pass to the app side; in JS, missing values are indicated by setting the values to null.

> However, according to
> http://developer.android.com/reference/android/hardware/Camera.Face.
> html#leftEye. Mouth and eyes can be null, if not supported.

In that case, we should pass the null values through to JS.

> I'm not sure how it's possible. But I expect it's dependent on each camera
> library. So currently I can not decide how I handle this.

If the camera library does not return null, it must return some value that is clearly distinguishable as indicating "not present"--in that case, we should convert it to null for JS.
The current problem is that I don't know the conditions to check if it fails to find the eyes or the mouth in the preview. so I'm looking for the conditions. If I get the information, I will be able to apply it soon.
Camera app and framework in Android do not use left_eye, right_eye, and mouth.
I think the reason is that it's dependent on the camera solution.
Android camera library interface does not define the values of left_eye, right_eye, and mouth, in case of their failures of detection.
So I think the default implementation should return them as 'null' values in order to extend it as needed.
And the default implementation of Gaia camera app should check if it's null or should not use them(left_eye, right_eye, and mouth).
If the device vendor want to support them, they will implement Gonk dependent on their solutions.
What do you think about that?
I will apply this concept to my new patch.
Whiteboard: PR ETA 3/5
The followings are updated.
1. I used the value of { x: -2000, y: -2000 } as the condition to check 'not-supported' for the eyes and the mouth. In that case, each eye or the mouth is passed to app-side with the value of 'undefined'.
2. Because face-detection has been supported since Android ICS (API level 15), I added the check for ANDROID version.
3. I modified TestGonkCameraHardware(WIP) for the fake-event generator. However, without a test script.

I am currently writing a test script. I hope to upload a test script tomorrow.
Attachment #8383303 - Attachment is obsolete: true
Target Milestone: --- → 1.4 S3 (14mar)
Comment on attachment 8386121 [details] [diff] [review]
Camera-Face-tracking-API.patch, v1.3

Review of attachment 8386121 [details] [diff] [review]:
-----------------------------------------------------------------

A few small fixes and one bigger one: ICameraControl.h must not have any DOM or JS artefacts in it, such as including CameraControlBinding.h. I've noted the changes necessary in that file.

::: dom/camera/DOMCameraControl.cpp
@@ +1155,5 @@
> +    JSAutoCompartment ac(cx, cb->Callback());
> +    Sequence<CameraDetectedFace> faces;
> +    nsRefPtr<dom::CameraCapabilities> capabilities = Capabilities();
> +    uint32_t len = aFaces.Length();
> +    len = len > capabilities->MaxDetectedFaces() ? capabilities->MaxDetectedFaces() : len;

Do we expect that the camera library will ever return more detected faces than reported by CameraParameters::KEY_MAX_NUM_DETECTED_FACES_HW ? Even if it does, I don't think that's a problem, you can remove this test and just pass the entire array to the callback.

::: dom/camera/GonkCameraControl.cpp
@@ +986,5 @@
> +
> +  nsTArray<ICameraControl::Face> faces;
> +  if (aMetaData) {
> +    uint32_t numFaces = aMetaData->number_of_faces;
> +    if (numFaces) {

Please remove this unnecessary test. If numFaces == 0, the 'faces' array will get set to zero capacity and the for-loop won't execute, which is correct.

::: dom/camera/ICameraControl.h
@@ +8,5 @@
>  #include "nsCOMPtr.h"
>  #include "nsString.h"
>  #include "nsAutoPtr.h"
>  #include "nsISupportsImpl.h"
> +#include "mozilla/dom/CameraControlBinding.h"

This file (and everything below it) must not be dependent on any DOM symbols, including *Binding.h. Please remove this.

@@ +103,5 @@
> +      aRect.mTop.Construct(top);
> +      aRect.mLeft.Construct(left);
> +      aRect.mBottom.Construct(bottom);
> +      aRect.mRight.Construct(right);
> +    }

Without CameraControlBinding.h, you won't be able to use CameraRect here. This is fine--just move the ToDictionary() code to DOMCameraControl.cpp and leave struct Rect a simple structure.

In fact, please just use struct Region instead and set the weight=1000 if it's not being used.

@@ +135,5 @@
>    };
> +
> +  struct Point {
> +    enum {
> +      kNotSupportedValue = -2000

This is fine; consider adding a simple method to test for whether or not a Point is valid:
  bool IsPresent() {
    return x != kNotSupportedValue || y != kNotSupportedValue;
  }

@@ +139,5 @@
> +      kNotSupportedValue = -2000
> +    };
> +    int32_t x;
> +    int32_t y;
> +    void ToJSValue(JSContext* aCx, JS::Value* aPoint) const

Please move the JS- and DOM-specific code out of this header file and into DOMCameraControl.cpp/.h.

@@ +161,5 @@
> +    Rect     rect;
> +    Point    leftEye;
> +    Point    rightEye;
> +    Point    mouth;
> +    void ToDictionary(JSContext* aCx, dom::CameraDetectedFace& aFace) const

Please move the JS- and DOM-specific code out of this header file and into DOMCameraControl.cpp/.h.

::: dom/camera/TestGonkCameraHardware.cpp
@@ +146,5 @@
> +  public:
> +    enum TestType {
> +      FOUND_ONE_FACE,
> +      FOUND_TWO_FACES,
> +      NOT_FOUND_NULLABLES,

I was confused by what the NOT_FOUND_NULLABLES test did. Let's give this a better name, like FOUND_ONE_FACE_NO_DETAILS. Whatever you choose, please update the string value too, e.g. "face-detection-found-one-face-no-details"

@@ +147,5 @@
> +    enum TestType {
> +      FOUND_ONE_FACE,
> +      FOUND_TWO_FACES,
> +      NOT_FOUND_NULLABLES,
> +      NOT_FOUND_FACE,

NOT_FOUND_FACE --> NO_FACES_FOUND

@@ +148,5 @@
> +      FOUND_ONE_FACE,
> +      FOUND_TWO_FACES,
> +      NOT_FOUND_NULLABLES,
> +      NOT_FOUND_FACE,
> +      UNKNOWN_TYPE

UNKNOWN_TYPE --> UNKNOWN_TEST

@@ +153,5 @@
> +    };
> +
> +  public:
> +    FaceDetected(nsGonkCameraControl* aTarget, TestType aType)
> +      : mTarget(aTarget), mType(aType)

Please put the initializes on separate lines:
      : mTarget(aTarget)
      , mType(aType)

@@ +162,5 @@
> +    {
> +      camera_frame_metadata_t metadata;
> +      camera_face_t faces[2];
> +      switch (mType) {
> +      case FOUND_ONE_FACE:

Please indent all of the case statements 2 spaces, and put a blank line after each break statement for example:
  switch (value) {
    case VALUE_1:
      // code
      break;

    case VALUE_2:
      // mode code
      break;

    default:
      // default code
      break;
  }

@@ +215,5 @@
> +        metadata.faces[0].rect[0] = 3;
> +        metadata.faces[0].rect[1] = 4;
> +        metadata.faces[0].rect[2] = 5;
> +        metadata.faces[0].rect[3] = 6;
> +        //Nullable values set to 'not-supported' speicific values

Please add a space between // and comment text. Also, "specific" is spelled wrong.

@@ +229,5 @@
> +        metadata.faces = 0;
> +        break;
> +      default:
> +        DOM_CAMERA_LOGE("Failed to find the test type for face-detection (type:%d)\n", mType);
> +        break;

Do we want to return instead on invalid values?

@@ +236,5 @@
> +      return NS_OK;
> +    }
> +
> +  protected:
> +    nsGonkCameraControl* mTarget;

Change this to nsRefPtr<nsGonkCameraControl> mTarget;

@@ +242,5 @@
> +  };
> +
> +  FaceDetected::TestType type;
> +  const nsCString& tc = TestCase();
> +  if (tc.EqualsASCII("face-detection-found-one-face"))

Please put braces around all of these conditions, including the else case.

@@ +244,5 @@
> +  FaceDetected::TestType type;
> +  const nsCString& tc = TestCase();
> +  if (tc.EqualsASCII("face-detection-found-one-face"))
> +    type = FaceDetected::FOUND_ONE_FACE;
> +  else if (tc.EqualsASCII("face-detection-found-two-face"))

"face-detection-found-two-faces"

@@ +248,5 @@
> +  else if (tc.EqualsASCII("face-detection-found-two-face"))
> +    type = FaceDetected::FOUND_TWO_FACES;
> +  else if (tc.EqualsASCII("face-detection-not-found-nullable"))
> +    type = FaceDetected::NOT_FOUND_NULLABLES;
> +  else if (tc.EqualsASCII("face-detection-not-found-face"))

"face-detection-no-faces-found"

@@ +253,5 @@
> +    type = FaceDetected::NOT_FOUND_FACE;
> +  else
> +    type = FaceDetected::UNKNOWN_TYPE;
> +
> +  if (type != FaceDetected::UNKNOWN_TYPE)

Opening brace goes at the end of the line: if (...) {

@@ +270,5 @@
> +int
> +TestGonkCameraHardware::StopFaceDetection()
> +{
> +  if (IsTestCase("face-detection-found-one-face") ||
> +      IsTestCase("face-detection-found-two-face") ||

"face-detection-found-two-faces"

@@ +272,5 @@
> +{
> +  if (IsTestCase("face-detection-found-one-face") ||
> +      IsTestCase("face-detection-found-two-face") ||
> +      IsTestCase("face-detection-not-found-nullable") ||
> +      IsTestCase("face-detection-not-found-face"))

"face-detection-no-faces-found"

::: dom/webidl/CameraControl.webidl
@@ +355,5 @@
> +  /* Callback for face detected in the preview frame. this is called with the
> +     array of the faces detected by camera device continuously.
> +     Although nothing is detected, the callback is called with null of an array
> +     of the faces. */
> +  attribute CameraFaceDetectionCallback? onFaceDetected;

Sorry, I should have mentioned this sooner: can you please make this plural? --> "onFacesDetected"
When I pointed out this API on public-script-coord in the context of wanting nullable dictionaries in dictionaries, people were confused why we're using a dictionary for the return value at all where an interface object (with nice APIs like hasEyes(), say) would make more sense.
Another API design question, actually: is there a reason we're restricting to having only one consumer listening for face detection at once?  Or is there typically only one piece of code working with a given CameraControl object?
For what it's worth, I think Allen is right on public-script-coord and that we should be using an interface here, not a dictionary...
Youngwoo, my apologies, since I asked you to make the DetectedFaces a dictionary instead of an interface. If you like, I can make this change--just post your latest updated patch and I'll take it from there.
bz, would the interface have attributes for eyes and mouth that could return null; or as Allen suggested in public-script-coord, do you think there should be boolean methods (or readonly attributes?) for hasLeftEye, hasRightEye, and hasMouth ?
Flags: needinfo?(bzbarsky)
You want the former even if the latter API exists, for people who decide to .leftEye without checking hasLeftEye or whatnot.
Flags: needinfo?(bzbarsky)
It's a very minor fix. so I named the version v1.3.1.
And TestGonkCameraHardware and the test script are updated.
Attachment #8386121 - Attachment is obsolete: true
The above patch is written based on Bug 965421. They have the conflicts between them.
I suppose that Bug 965421 is applied first, because Bug 965421 is currently review+.
So if you build this patch, it requires the patch of Bug 965421.
Implements detected faces as an interface instead of a dictionary. This (finally) builds but is not yet tested.
Attachment #8387739 - Attachment is obsolete: true
Remove some stale code.
Assignee: hiro7998 → mhabicher
Attachment #8387985 - Attachment is obsolete: true
Fix syntax errors in the automated test. The tests still fail with a timeout.
Attachment #8387986 - Attachment is obsolete: true
Automated tests now run to completion successfully.

try-server push: https://tbpl.mozilla.org/?tree=Try&rev=db2fa75682dc&showall=1
Attachment #8387999 - Attachment is obsolete: true
Posted patch Part 2 - DOM, v1 (obsolete) — Splinter Review
With CameraDetectedFace implemented as an interface instead of a dictionary.
Attachment #8388020 - Flags: review?(bzbarsky)
Comment on attachment 8388019 [details] [diff] [review]
Part 1 - Camera, v1

dhylands: I've already been through several review cycles with Youngwoo, so this patch should be pretty mature by now. But since I took it over to convert the CameraDetectedFace dictionary (back) to an interface, it's probably a good time to ask someone else to look it over. What do you think?

If it helps, this is the interdiff between my latest version and Youngwoo's last version: https://bugzilla.mozilla.org/attachment.cgi?oldid=8387739&action=interdiff&newid=8388017&headers=1
Attachment #8388019 - Flags: review?(dhylands)
Fix test_interfaces.html and remove 'this' from the member initializer list to keep the WinXP compiler happy.

try-server push: https://tbpl.mozilla.org/?tree=Try&rev=81882fbf7ca2&showall=1
Attachment #8388017 - Attachment is obsolete: true
Posted patch Part 2 - DOM, v2 (obsolete) — Splinter Review
Updated patch for review.
Attachment #8388020 - Attachment is obsolete: true
Attachment #8388020 - Flags: review?(bzbarsky)
Attachment #8388110 - Flags: review?(bzbarsky)
Comment on attachment 8388019 [details] [diff] [review]
Part 1 - Camera, v1

Review of attachment 8388019 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me.

::: dom/camera/CameraControlImpl.cpp
@@ +140,5 @@
> +  RwLockAutoEnterRead lock(mListenerLock);
> +
> +  for (uint32_t i = 0; i < mListeners.Length(); ++i) {
> +    CameraControlListener* l = mListeners[i];
> +    l->OnFacesDetected(aFaces);

This means that we'll be calling the callback with the lock held. Is there any possibility of deadlock?

::: dom/camera/TestGonkCameraHardware.cpp
@@ +173,5 @@
> +
> +  nsresult
> +  ReleaseFacesArray()
> +  {
> +    if (mMetaData.faces) {

You're allowed to call delete with a nullptr.

::: dom/camera/GonkCameraControl.cpp
@@ +997,5 @@
> +FeatureDetected(int32_t feature[])
> +{
> +  /**
> +   * Where does the magic number -2000 come from?
> +   * http://androidxref.com/4.0.4/xref/system/core/include/system/camera.h#202

I think its just an arbitrary number outside the -1000 to +1000 range.

::: dom/camera/test/test_camera_hardware_face_detection.html
@@ +290,5 @@
> +
> +</script>
> +</body>
> +
> +</html>

I notice that there weren't any tests which covered receiving something that you weren't epecting - Maybe that doesn't make sense yet?
Attachment #8388019 - Flags: review?(dhylands) → review+
I'm sorry for the lag here.  I should be able to get to this on Wednesday...
(In reply to Dave Hylands [:dhylands] from comment #34)
> 
> ::: dom/camera/CameraControlImpl.cpp
> @@ +140,5 @@
> > +  RwLockAutoEnterRead lock(mListenerLock);
> > +
> > +  for (uint32_t i = 0; i < mListeners.Length(); ++i) {
> > +    CameraControlListener* l = mListeners[i];
> > +    l->OnFacesDetected(aFaces);
> 
> This means that we'll be calling the callback with the lock held. Is there
> any possibility of deadlock?

No. And not just because the DOM-facing callbacks all dispatch runnables to the Main Thread, so that this lock is held very briefly. The RwLockAutoEnterRead/Write classes wrap a PR_ReadWriteLock object that allows 0..N readers, exclusively-or 1 writer, and write locks are only ever attempted through a runnable dispatched to the Camera Thread. This means that even if (e.g.) OnAutoFocusComplete() runs on the Camera Thread and a registered listener tries to modify the listener array (by calling Add/RemoveListener()), the write lock won't be acquired until the current lock is released.

I've been asked this a few times now, so I'll add some comments to the code.

> ::: dom/camera/GonkCameraControl.cpp
> @@ +997,5 @@
> > +FeatureDetected(int32_t feature[])
> > +{
> > +  /**
> > +   * Where does the magic number -2000 come from?
> > +   * http://androidxref.com/4.0.4/xref/system/core/include/system/camera.h#202
> 
> I think its just an arbitrary number outside the -1000 to +1000 range.
> 
> ::: dom/camera/test/test_camera_hardware_face_detection.html
> @@ +290,5 @@
> > +
> > +</script>
> > +</body>
> > +
> > +</html>
> 
> I notice that there weren't any tests which covered receiving something that
> you weren't expecting - Maybe that doesn't make sense yet?

I'll make the code more conservative and modify one of the test cases to make sure we discard features with points outside of the range [-1000, 1000]. I'll also add a test case to clamp 'score' to 100,  Otherwise, 'id' is just a unique numeric identifier.
Incorporate review feedback.
Attachment #8388019 - Attachment is obsolete: true
Attachment #8388109 - Attachment is obsolete: true
bz: review-ping? Just wondering when you think you might have a chance to look at this.
Flags: needinfo?(bzbarsky)
See comment 35?  ;)
Flags: needinfo?(bzbarsky)
Comment on attachment 8388110 [details] [diff] [review]
Part 2 - DOM, v2

>+++ b/dom/webidl/CameraFaceDetection.webidl
>+interface CameraPoint

I believe this, like all the other camera stuff, should be conditioned on Navigator::HasCameraSupport.  Please file bugs on the parts of it that are not thus conditioned right now (e.g. window.CameraManager).

Also, once DOMPoint lands we may want to use it here.

>+   'bound' is the bounds of the face. This type is a CameraRegion with 'weight'
>+   value of 1000. It is guaranteed left < right and top < bottom.

This part of the comment doesn't match the IDL, right?

>+   may not be supported on all devices. If it is not supported, the value will
>+   be set to undefined.

As the IDL is currently written, 'leftEye' will never be undefined.  Did this comment mean null?

>+interface CameraDetectedFace

Again, condition on HasCameraSupport.

>+  readonly attribute unsigned long id;

We're presuming this won't overflow, right?  I guess that's a fairly safe assumption...

>+  readonly attribute DOMRect bound;

Why bound and not bounds?  Are we trying to match some other API here?

>+  readonly attribute CameraPoint? leftEye;

No one cares about the bounds of the detected feature, just the centerpoint?

>+partial interface CameraControl

I'm 99% sure this part needs to be in CameraControl.webidl to work correctly in non-clobber builds; the WebIDL build dependency tracking can't handle partial interfaces across multiple files, last I checked.

>+  [Throws]
>+  void startFaceDetection();

Document when/why this throws?

Also document that the callback will be called periodically (how often?) until stopFaceDetection is called?

>+  [Throws]
>+  void stopFaceDetection();

Again, document when/why it throws.

>+  /* Callback for face detected in the preview frame. this is called with the
>+     array of the faces detected by camera device continuously.
>+     Although nothing is detected, the callback is called with null of an array
>+     of the faces. */

Could you please fix the capitalization and grammar of this comment?  There's a bunch of issues there.

>+  attribute CameraFaceDetectionCallback? onFacesDetected;

It's a bit weird to have something other than an event handler have a name starting with "on"...  I'm not sure "facesDetected" or "facesDetectedCallback" is any better, though.

I assume in common usage this would be set once and then start/stop would happen a bunch of times, so making this an argument to startFaceDetection doesn't make sense?

>+++ b/dom/camera/DOMCameraDetectedFace.h
>+class DOMCameraPoint MOZ_FINAL : public nsISupports

Why not just CameraPoint for the class name?  It's already in the mozilla::dom namespace, so the extra "DOM" is redundant.  We generally try to have the C++ name match the WebIDL one, especially in new code.

This applies to the filename too.  Why can't it just be CameraDetectedFace.h/cpp ?

>+  DOMCameraPoint(nsISupports* aParent)

This constructor seems to be unused.

>+  DOMCameraPoint(nsISupports* aParent, const ICameraControl::Point& aPoint)

As does this one.

>+class DOMCameraDetectedFace MOZ_FINAL : public nsISupports

Again, why the extra "DOM"?

>+  DOMCameraDetectedFace(nsISupports* aParent)

Unused?

>+  already_AddRefed<dom::DOMRect> Bound();
>+  already_AddRefed<DOMCameraPoint> GetLeftEye();
>+  already_AddRefed<DOMCameraPoint> GetRightEye();
>+  already_AddRefed<DOMCameraPoint> GetMouth();

Seems to me like these can all return raw pointers, not already_AddRefed.

>+++ b/dom/camera/DOMCameraDetectedFace.cpp
>+NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_1(DOMCameraDetectedFace, mParent)

  NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_5(DOMCameraDetectedFace, mParent,
                                          mBound, mLeftEye, mRightEye, mMouth)

>+  // Apparently the WinXP compiler won't let us use 'this' in a
>+  // base member initializer list due to warning C4355.

See MOZ_THIS_IN_INITIALIZER_LIST(), if you do want to use it there.

>+DOMCameraDetectedFace::Bound()
>+{
>+  nsRefPtr<DOMRect> bound = mBound;
>+  return bound.forget();

So this could just be:

  return mBound;

if you changed the return value to be a DOMRect*.  Similar for the eyes and mouth.

>diff --git a/dom/camera/DOMCameraControl.h b/dom/camera/DOMCameraControl.h
>+  already_AddRefed<dom::CameraFaceDetectionCallback> GetOnFacesDetected();

Again, this can return a raw pointer.

>+++ b/dom/camera/DOMCameraControl.cpp

>+NS_IMPL_CYCLE_COLLECTION_INHERITED_20(nsDOMCameraControl, DOMMediaStream,

Where is NS_IMPL_CYCLE_COLLECTION_INHERITED_20 defined?

>+++ b/dom/camera/DOMCameraControlListener.cpp
>+      , mFaces(aFaces)

I assume these arrays are never long enough to worry about swapping instead of copying here?x

>+++ b/dom/tests/mochitest/general/test_interfaces.html

You won't need changes to this file if you mark the interfaces as conditional.

r=me with the above dealt with, and sorry again for the lag.
Attachment #8388110 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #40)
> Comment on attachment 8388110 [details] [diff] [review]
> Part 2 - DOM, v2
> 
> >+++ b/dom/webidl/CameraFaceDetection.webidl
> >+interface CameraPoint
> 
> I believe this, like all the other camera stuff, should be conditioned on
> Navigator::HasCameraSupport.  Please file bugs on the parts of it that are
> not thus conditioned right now (e.g. window.CameraManager).

I've filed new bug 983180 to track this for all of the camera stuff.

> Also, once DOMPoint lands we may want to use it here.

I went looking for a DOMPoint when I implemented this, but couldn't find it. Will keep an eye on bug 850805.

> >+  readonly attribute unsigned long id;
> 
> We're presuming this won't overflow, right?  I guess that's a fairly safe
> assumption...

This could overflow, but it's only used as a unique identifier for the lifetime of a visible face. I've updated the IDL to mention that if the face leaves the viewfinder and then returns, it will be assigned a new value.

> >+  readonly attribute DOMRect bound;
> 
> Why bound and not bounds?  Are we trying to match some other API here?

No--I just couldn't decide whether or not to pluralize this. I'll make it plural.

> >+  readonly attribute CameraPoint? leftEye;
> 
> No one cares about the bounds of the detected feature, just the centerpoint?

Correct. The camera only ever exports (x, y)-coordinates of the feature. I support we _could_ make this a DOMRect with zero-width and height.

> >+  attribute CameraFaceDetectionCallback? onFacesDetected;
> 
> It's a bit weird to have something other than an event handler have a name
> starting with "on"...  I'm not sure "facesDetected" or
> "facesDetectedCallback" is any better, though.

FWIW, the 'onBlah' naming is consistent with the rest of the CameraControl API.

> I assume in common usage this would be set once and then start/stop would
> happen a bunch of times, so making this an argument to startFaceDetection
> doesn't make sense?

That's correct, though you're not the first person to express confusion over the naming of the callbacks. I wonder if it's worth making these proper event handlers.

> >+++ b/dom/camera/DOMCameraDetectedFace.h
> >+class DOMCameraPoint MOZ_FINAL : public nsISupports
> 
> Why not just CameraPoint for the class name?  It's already in the
> mozilla::dom namespace, so the extra "DOM" is redundant.  We generally try
> to have the C++ name match the WebIDL one, especially in new code.
> 
> This applies to the filename too.  Why can't it just be
> CameraDetectedFace.h/cpp ?

Consistency with the rest of the CameraControl implementation. I've filed bug 983177 to clean this up (and move the non-DOM camera stuff out of dom/camera and into something more sensible, like maybe content/camera).

> >+++ b/dom/camera/DOMCameraControl.cpp
> 
> >+NS_IMPL_CYCLE_COLLECTION_INHERITED_20(nsDOMCameraControl, DOMMediaStream,
> 
> Where is NS_IMPL_CYCLE_COLLECTION_INHERITED_20 defined?

It got sucked into the non-DOM half of the patch, here:
https://bugzilla.mozilla.org/attachment.cgi?id=8389276&action=diff#a/xpcom/glue/nsCycleCollectionParticipant.h_sec2

> >+++ b/dom/camera/DOMCameraControlListener.cpp
> >+      , mFaces(aFaces)
> 
> I assume these arrays are never long enough to worry about swapping instead
> of copying here?x

Just a handful of faces. Also, while we don't do it now, the non-DOM layer supports multiple listeners, in which case each one would need to get its own copy of the faces array--or at least a reference to a common one.

> >+++ b/dom/tests/mochitest/general/test_interfaces.html
> 
> You won't need changes to this file if you mark the interfaces as
> conditional.
> 
> r=me with the above dealt with, and sorry again for the lag.

Not a problem. Thank you very much for the detailed review.
Incorporate review feedback; carrying r+en forward.

try-server push: https://tbpl.mozilla.org/?tree=Try&rev=bc3c38d504f1&showall=1
Attachment #8388110 - Attachment is obsolete: true
Attachment #8389276 - Attachment is obsolete: true
Attachment #8390598 - Flags: review+
> like maybe content/camera

We're trying to get rid of content/ and move all of that into dom/, fwiw.
Incorporate off-line feedback from bz, re: CameraDetectedFace.webidl. This patch merges it into CameraControl.webidl.

try-server push: https://tbpl.mozilla.org/?tree=Try&rev=44e10c84e9a2&showall=1
Attachment #8390598 - Attachment is obsolete: true
Attachment #8390792 - Flags: review+
See Also: → 983180
Remove stale header references. Will do a try-server run once everything is re-opened.
Attachment #8390792 - Attachment is obsolete: true
Attachment #8391269 - Flags: review+
DOMPoints landed in bug 917755, so it's probably worth updating this patch to use those instead of the (new) CameraPoint interface.
https://hg.mozilla.org/mozilla-central/rev/e6ac52f874d0
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Blocks: 999396
Just poking through the Android documentation, and I assume this will apply to us as well so I'm sticking it here: "When the face detection is running, setWhiteBalance(String), setFocusAreas(List), and setMeteringAreas(List) have no effect. The camera uses the detected faces to do auto-white balance, auto exposure, and autofocus."[1]

This probably means we'll need to disable face-tracking when a touch-focus event occurs.

1. http://developer.android.com/reference/android/hardware/Camera.html#startFaceDetection%28%29
You need to log in before you can comment on or make changes to this bug.