getUserMedia camera fails to start on android sometimes due to preview/picture size mismatch

VERIFIED FIXED in Firefox 60

Status

()

defect
P2
normal
Rank:
15
VERIFIED FIXED
Last year
Last year

People

(Reporter: pehrsons, Assigned: pehrsons)

Tracking

53 Branch
mozilla61
Unspecified
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox59 wontfix, firefox60 verified, firefox61 verified)

Details

Attachments

(2 attachments)

On a Samsung Galaxy S8 I see this in logcat for supported capabilites:
> 02-27 17:31:20.110  3350  3820 D ExynosCamera1Parameters: [CAM_ID(1)][ParametersFront]-DEBUG(setDefaultParameter[971]): KEY_SUPPORTED_VIDEO_SIZES       1920x1080,1440x1440,1440x1080,1088x1088,1280x720,960x720,800x450,720x480,640x480,480x320,352x288,320x240,256x144,176x144

While in MediaManager when choosing the most suitable capability we choose between these:
> 77330 02-27 17:31:20.056 30254 30591 I Gecko   : 2018-02-27 16:31:20.056290 UTC - [30254:MediaManager]: D/MediaManager ChooseCapability: prefs: 640x480       @30fps
> 77331 02-27 17:31:20.056 30254 30591 I Gecko   : 2018-02-27 16:31:20.056337 UTC - [30254:MediaManager]: D/MediaManager Constraints: width: { min: -2147      483647, max: 2147483647 }
> 77332 02-27 17:31:20.056 30254 30591 I Gecko   : 2018-02-27 16:31:20.056521 UTC - [30254:MediaManager]: D/MediaManager              height: { min: -214      7483647, max: 2147483647 }
> 77333 02-27 17:31:20.056 30254 30591 I Gecko   : 2018-02-27 16:31:20.056567 UTC - [30254:MediaManager]: D/MediaManager              frameRate: { min: -      inf, max: inf }
> 77334 02-27 17:31:20.058 30254 30591 I Gecko   : 2018-02-27 16:31:20.058497 UTC - [30254:MediaManager]: D/MediaManager Capability: 1920 x 1080 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77335 02-27 17:31:20.059 30254 30591 I Gecko   : 2018-02-27 16:31:20.059185 UTC - [30254:MediaManager]: D/MediaManager Capability: 1440 x 1080 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77336 02-27 17:31:20.059 30254 30591 I Gecko   : 2018-02-27 16:31:20.059638 UTC - [30254:MediaManager]: D/MediaManager Capability: 1088 x 1088 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77337 02-27 17:31:20.059 30254 30591 I Gecko   : 2018-02-27 16:31:20.059966 UTC - [30254:MediaManager]: D/MediaManager Capability: 1072 x 1072 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77338 02-27 17:31:20.060 30254 30591 I Gecko   : 2018-02-27 16:31:20.060354 UTC - [30254:MediaManager]: D/MediaManager Capability: 1280 x  720 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77339 02-27 17:31:20.060 30254 30591 I Gecko   : 2018-02-27 16:31:20.060680 UTC - [30254:MediaManager]: D/MediaManager Capability: 1056 x  704 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77340 02-27 17:31:20.061 30254 30591 I Gecko   : 2018-02-27 16:31:20.061023 UTC - [30254:MediaManager]: D/MediaManager Capability:  960 x  720 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77341 02-27 17:31:20.061 30254 30591 I Gecko   : 2018-02-27 16:31:20.061863 UTC - [30254:MediaManager]: D/MediaManager Capability:  800 x  450 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77342 02-27 17:31:20.062 30254 30591 I Gecko   : 2018-02-27 16:31:20.062312 UTC - [30254:MediaManager]: D/MediaManager Capability:  736 x  736 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77343 02-27 17:31:20.063 30254 30591 I Gecko   : 2018-02-27 16:31:20.063066 UTC - [30254:MediaManager]: D/MediaManager Capability:  720 x  480 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77344 02-27 17:31:20.063 30254 30591 I Gecko   : 2018-02-27 16:31:20.063794 UTC - [30254:MediaManager]: D/MediaManager Capability:  640 x  480 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77345 02-27 17:31:20.064 30254 30591 I Gecko   : 2018-02-27 16:31:20.064207 UTC - [30254:MediaManager]: D/MediaManager Capability:  352 x  288 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77346 02-27 17:31:20.064 30254 30591 I Gecko   : 2018-02-27 16:31:20.064605 UTC - [30254:MediaManager]: D/MediaManager Capability:  320 x  240 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77347 02-27 17:31:20.065 30254 30591 I Gecko   : 2018-02-27 16:31:20.064981 UTC - [30254:MediaManager]: D/MediaManager Capability:  256 x  144 x 30 max      Fps, NV21, Unknown codec. Distance = 0
> 77348 02-27 17:31:20.066 30254 30591 I Gecko   : 2018-02-27 16:31:20.066253 UTC - [30254:MediaManager]: D/MediaManager Capability:  176 x  144 x 30 max      Fps, NV21, Unknown codec. Distance = 0


Per bug 1441145 comment 1 when choosing 736x736 we fail to start capturing, resulting in a rejection of getUserMedia(). Based on the logcat output above this makes sense. But really, we should choose between the real capabilities available for this device.

Per bug 1441145 comment 6 this happens for many different devices and vendors, which makes me suspect a bug in webrtc.org.
Rank: 15
Priority: -- → P2
Comment on attachment 8965401 [details]
Bug 1441585 - Set picture size to something valid.

https://reviewboard.mozilla.org/r/234152/#review239770

lgtm

::: media/webrtc/trunk/webrtc/modules/video_capture/android/java/src/org/webrtc/videoengine/VideoCaptureAndroid.java:212
(Diff revision 1)
> +      for (Camera.Size size : supportedPictureSizes) {
> +        if (size.width >= width &&
> +            size.height >= height &&
> +            size.width <= pictureSize.width &&
> +            size.height <= pictureSize.height) {
> +          pictureSize = size;

It looks like this will end up keeping a smaller pictureSize than the desired width and height if it is the first item in the supportedPictureSizes list. 

Maybe sort supportedPictureSizes so largest is first? The Android API documentation doesn't mention whether or not the list is sorted, so we probably can't assume that it will come in an order that makes sense on all phones.
Attachment #8965401 - Flags: review?(dminor) → review+
Comment on attachment 8965401 [details]
Bug 1441585 - Set picture size to something valid.

https://reviewboard.mozilla.org/r/234152/#review239770

> It looks like this will end up keeping a smaller pictureSize than the desired width and height if it is the first item in the supportedPictureSizes list. 
> 
> Maybe sort supportedPictureSizes so largest is first? The Android API documentation doesn't mention whether or not the list is sorted, so we probably can't assume that it will come in an order that makes sense on all phones.

Good catch! For uplift I don't want to dabble with sorting or filtering or the like though so I'll try to just make the if a bit more explicit.
Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Upstream fixed this proper 3 years ago, [1].
We even had a (not proper IMHO) workaround in place until the webrtc.org 49 merge in Firefox 53, bug 1250356, [2], when it for some reason disappeared. The workaround came to be in bug 1129365, which is the exact duplicate of this, just 22 versions ago :-(

jib, how can we prevent this from happening again?

[1] https://chromium.googlesource.com/external/webrtc.git/+/a1c9803e3283d41da0256779a01e355a881d407b%5E%21/#F0
[2] https://hg.mozilla.org/mozilla-central/diff/143bb4b9249e/media/webrtc/trunk/webrtc/modules/video_capture/android/java/src/org/webrtc/videoengine/VideoCaptureAndroid.java
Blocks: 1250356
Flags: needinfo?(jib)
Summary: Android camera capture backend reporting wrong capabilities → getUserMedia camera fails to start on android sometimes due to preview/picture size mismatch
Version: 60 Branch → 53 Branch
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9505c6334875
Set picture size to something valid. r=dminor
(In reply to Andreas Pehrson [:pehrsons] from comment #5)
> jib, how can we prevent this from happening again?

Normally, I'd say push back in review on locally patching upstream code, without bugs filed upstream. But if I understand correctly, this was fixed upstream at the point in time we patched it locally. What was unusual seems to be that from what I understand from #media [1] the upstream code got moved out of webrtc.org? 

Regardless of cause, had we had tests for this, we'd be fine. Mistakes will happen whenever we operate without a safety net. So the answer to preventing this from happening IMHO is to have tests.

It's hard to test this one since it only happened on real devices, and only some of them.

I've been contemplating a couple of ideas:

A) Add a flag to run mochitests with real devices (if there isn't one already).
   Tests that rely on specific devices, like loopback devices, need to know to only run when said devices are available.
   Then ask QA to run the (whole or subset) mochitest suite with flag on real machines with usual-suspect devices occasionally.
   With this in place, begin to add tests specifically for real devices, knowing they would get run from time to time by QA.
   E.g. we could then have added a test specifically to test this invalid resolution workaround on S4 and S8,
   and reviewers could have insisted on this test being written before landing.

B) We could try to write a unit-test that somehow simulates the problematic devices (S4/S8/?) that tests that the workaround
   is in place and is functioning.

Basically, any code not covered by tests *will* break.

My 2 cents.

[1] https://mozilla.logbot.info/media/20180404#c14560262
Flags: needinfo?(jib)
Oh and

 C) We could be developing a new class of semi-automated mochitests to be run occasionally under supervision.
    These might have on-screen instructions on what success should look like, and what constitutes failure.
    This might be good for things that are hard for a computer to catch, like the picture is garbled or the audio is crap.

Or a combination perhaps. The specific thing needing to be tested may dictate the best approach.
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #7)
> What was unusual seems
> to be that from what I understand from #media [1] the upstream code got
> moved out of webrtc.org? 

It seems to me like the 49 merge in 53 at [2] grabbed an old upstream and didn't re-apply the changes we had already landed in our tree. The rollup of previous changes for the 49 merge is at [3] and contains a flat copy of an old upstream without our old patches present. The 49 merge itself did indeed remove the android backend, so it being moved out of upstream is for sure part of the reason.


[2] https://hg.mozilla.org/mozilla-central/diff/143bb4b9249e/media/webrtc/trunk/webrtc/modules/video_capture/android/java/src/org/webrtc/videoengine/VideoCaptureAndroid.java
[3] https://hg.mozilla.org/mozilla-central/diff/126348e718d0/media/webrtc/trunk/webrtc/modules/video_capture/android/java/src/org/webrtc/videoengine/VideoCaptureAndroid.java
[4] https://hg.mozilla.org/mozilla-central/diff/e10e9f0e5ca2/media/webrtc/trunk/webrtc/modules/video_capture/android/java/src/org/webrtc/videoengine/VideoCaptureAndroid.java
As for the test ideas I think all of A, B and C are worth looking into.

For this particular case I don't think B will be easy. The Android Camera API doesn't allow itself to be mocked so we'd have to write a wrapper around it that we use in the backend, and that would allow mocking in a java unit test. But given that this is upstream code, introducing code like this would make future merges much harder. It's in the hands of upstream.

This could definitely be tested by both A and C. I'll make a fiddle that takes this to an extreme, for manual validation. At the very least, I hope that can be taken into the QA manual regression test suite.
Please verify this on as many Android devices as possible as soon as this has landed. Desktop is useful to test too on all platforms but I don't suspect issues there.

Use the fiddle at [1]. Approve all gUM requests. On desktop you may use persistent permissions. It's done and successful when the log reads "DONE All 27 cases tested" at the bottom.


[1] https://jsfiddle.net/pehrsons/yrbxrzhq/
Flags: qe-verify+
https://hg.mozilla.org/mozilla-central/rev/9505c6334875
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #7)
> (In reply to Andreas Pehrson [:pehrsons] from comment #5)
> > jib, how can we prevent this from happening again?
> 
> Normally, I'd say push back in review on locally patching upstream code,
> without bugs filed upstream. But if I understand correctly, this was fixed
> upstream at the point in time we patched it locally. What was unusual seems
> to be that from what I understand from #media [1] the upstream code got
> moved out of webrtc.org? 
> 
> Regardless of cause, had we had tests for this, we'd be fine. Mistakes will
> happen whenever we operate without a safety net. So the answer to preventing
> this from happening IMHO is to have tests.
> 
> It's hard to test this one since it only happened on real devices, and only
> some of them.
> 
> I've been contemplating a couple of ideas:
> 
> A) Add a flag to run mochitests with real devices (if there isn't one
> already).
>    Tests that rely on specific devices, like loopback devices, need to know
> to only run when said devices are available.
>    Then ask QA to run the (whole or subset) mochitest suite with flag on
> real machines with usual-suspect devices occasionally.
>    With this in place, begin to add tests specifically for real devices,
> knowing they would get run from time to time by QA.
>    E.g. we could then have added a test specifically to test this invalid
> resolution workaround on S4 and S8,
>    and reviewers could have insisted on this test being written before
> landing.
> 
> B) We could try to write a unit-test that somehow simulates the problematic
> devices (S4/S8/?) that tests that the workaround
>    is in place and is functioning.
> 
> Basically, any code not covered by tests *will* break.
> 
> My 2 cents.
> 
> [1] https://mozilla.logbot.info/media/20180404#c14560262

I completely agree that more unit tests are the best way of preventing problems like this from happening in the future.

I'm a bit of a pessimist about opening bugs upstream. We're often far enough behind them that they have already fixed the problem upstream or the code has changed enough that our patches don't apply cleanly and we're stuck landing a different version of the patch upstream in the hopes it will still fix our problem the next time we update.

Having good unit tests will also make it easier to switch over to new upstream code without worrying about regressions. This isn't the only area where we're maintaining code removed from upstream in our local copy.
Comment on attachment 8965401 [details]
Bug 1441585 - Set picture size to something valid.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1250356 
[User impact if declined]: Camera capture may be unusable on some devices.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Not yet. I have verified the patch locally however.
[Needs manual test from QE? If yes, steps to reproduce]: Yes, see comment 11
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not really.
[Why is the change risky/not risky?]: Simple enough logic.
[String changes made/needed]: None
Attachment #8965401 - Flags: approval-mozilla-beta?
Comment on attachment 8965401 [details]
Bug 1441585 - Set picture size to something valid.

Straightforward fix, approved for 60.0b11.
Attachment #8965401 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Verified as fixed on latest Nightly build 61.0a1 - 2018/04/12 and Beta 60.0b11 following description from comment 11.
Devices:
 - Google Pixel (Android 8.1.0)
 - HTC 10 (Android 8.0.0)
 - Samsung Galaxy S8 (Android 7.0)
 - Huawei P9 Lite (Android 6.0)
 - Huawei Honor (Android 5.1.1)
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.