Closed Bug 1094070 Opened 10 years ago Closed 9 years ago

[MTBF][B2G][Camera][Media Server] camera screen become black with buttons or stating that the camera resource is not available

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+, b2g-v2.1 fixed, b2g-v2.2 fixed)

RESOLVED FIXED
blocking-b2g 2.1+
Tracking Status
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: wachen, Unassigned)

References

Details

Attachments

(1 file)

First of all, thanks Cervantes for tracking this bug down.

This bug happened in all devices last MTBF testing.
Base: V188
Build: PVT v2.1 20141030161200

When the test ended, we found that devices:
1. camera shows black screen with buttons (not functional)
2. even when we killed camera and restart it, it will not recover.

We can find the following message in adb logcat when camera launched
W/AudioFlinger(15312): Thread AudioOut_2 cannot connect to the power manager service
W/AudioFlinger(15312): Thread AudioOut_2 cannot connect to the power manager service
E/AudioFlinger(15312): no wake lock to update!
W/CameraService(15312): CameraService::connect X (pid 31972) rejected (existing client).

We also found that we can kill the media server and camera, and it will be normal launching camera app. We suspect it's media server issue in gonk. 

Someone please look into this.
[Blocking Requested - why for this release]:
Camera app won't work, and MTBF will stop.
Blocks: MTBF-B2G
blocking-b2g: --- → 2.1?
Evelyn, Cervantes,
Please help with needinfo media team in USA (I don't know who exactly need to be pinged)
Flags: needinfo?(ehung)
Flags: needinfo?(cyu)
This is a full package with dmesg, b2g-info, b2g-ps, logcat, and so on.
Please download here -->
https://mozilla.box.com/s/arcgxm0e1jvswxq71j69

After logcat281, camera is not able to work normally.
After logcat454, camera is showing some message like this:
E/AudioFlinger(15312): no wake lock to update!
W/CameraService(15312): CameraService::connect X (pid 31972) rejected (existing client).
(In reply to Walter Chen[:ypwalter][:wachen] from comment #0)
> 
> We also found that we can kill the media server and camera, and it will be
> normal launching camera app. We suspect it's media server issue in gonk. 
> 
More precisely, kill mediaserver and restart the Camera app. The camera app then previews/shoots correctly.

Mike, we suspect that we encounter a bug in mediaserver. Can you confirm this?
Flags: needinfo?(cyu) → needinfo?(mhabicher)
Flags: needinfo?(ehung) → needinfo?(hkoka)
(In reply to Walter Chen[:ypwalter][:wachen] from comment #3)
> This is a full package with dmesg, b2g-info, b2g-ps, logcat, and so on.
> Please download here -->
> https://mozilla.box.com/s/arcgxm0e1jvswxq71j69
> 
> After logcat281, camera is not able to work normally.
> After logcat454, camera is showing some message like this:
> E/AudioFlinger(15312): no wake lock to update!
> W/CameraService(15312): CameraService::connect X (pid 31972) rejected
> (existing client).

These lines don't appear anywhere in the files contained in the attached archive. grep shows:

# grep -rn "CameraService" *
logcat625:1301:11-02 20:34:36.166 15312 15312 I CameraService: CameraService started (pid=15312)
logcat625:1304:11-02 20:34:36.196 15312 15312 I CameraService: Loaded "QCamera Module" camera module
logcat734:1226:11-03 05:07:40.205 15312 15245 W CameraService: Disconnecting camera client 0xb8d25e58 since the binder for it died (this pid 15312)
logcat756:1017:11-03 06:53:24.668 15312 15245 W CameraService: Disconnecting camera client 0xb8d21f30 since the binder for it died (this pid 15312)

I also noticed this in dmesg645:

<3>[199111.024119] BUG: sleeping function called from invalid context at /local/build/foxfone-one-v1.4-release/v184/kernel/arch/arm/mm/fault.c:311

The string "v188" doesn't appear at all. Are you sure, per comment 0, that your test device is running v188?
Flags: needinfo?(mhabicher) → needinfo?(wachen)
I wrote this up in bug 1093797 yesterday. Could be related. Black camera viewfinder with buttons. I was able to get 100% steps.
Setting depends bug 1093797 for gathering more evidence.
Depends on: 1093797
I think I flashed all of them to v188
I will try to get the correct log for you or provide you with a new log for this (a new run of MTBF tests started yesterday)
Flags: needinfo?(wachen)
Flags: needinfo?(wachen)
Blocks: 1084179
I figured that some of those devices were indeed v184. However, after I ran it for a day, I found that it did happened in 3 devices (out of 15) alrdy. We will see if more devices fail tomorrow (average MTBF time is long)

Please do start to look into bug and provide helps, or we will never sign off this MTBF blocker by cc of v2.1. A bug happens in v180, v184, or v188 won't just disappear automatically. It will be there as always.
Flags: needinfo?(wachen)
Attached image v188 20141111 e474daf4
I am still looking for more phone to fail now. This screenshot is from the only one failed now.
Also, you can find the log here:

again, if I kill the mediaserver:
media     205   1     78312  480   ffffffff b6f4fa78 S /system/bin/mediaserver
root@flame:/ # kill -9 205
media     27700 1     66876  4472  ffffffff b6e7e6fc S /system/bin/mediaserver
I can bring the camera back when mediaserver came back.

https://mozilla.box.com/s/pnjarbzhhtxj1mfum7s0
Flags: needinfo?(mhabicher)
Given comment 9 and MTBF concerns, 2.1+.
blocking-b2g: 2.1? → 2.1+
Summary: [MTBF][B2G][Camera][Media Server] black camera screen with buttons → [MTBF][B2G][Camera][Media Server] camera screen become black with buttons or stating that the camera resource is not available
It still happening across almost all devices. However, there is seldom black screen on camera but rather showing "Camera is in use by another app."
Diego, According to https://bugzilla.mozilla.org/show_bug.cgi?id=1097835#c4, Is this bug a duplicate of bug 1097835.
Flags: needinfo?(dmarcos)
Hi Walter, you ni?-ed me -- what info do you need?
Flags: needinfo?(mhabicher) → needinfo?(wachen)
Hi, Mike

I think that you once indicated that it is not v188 build. Now, I have a 90% reproduction rate of this bug among all the devices. It is affecting MTBF testing in Camera, Video Recording, Video, and Photo. You can find link to download logs in comment 10. I also have all other devices not reaching v2.1 target goal and I can provide other logs for you.


Hi, Bhavana, 

Can you help to push it in your side? Let's finish this bug, and I believe it could reach 100hr+ in our side after this resolved.





Just mummuring..
We need people look into this bug...before v2.1 cc. I am amazed by the slowness of processing MTBF bugs comparing to CAF's bugs. They will finally get to a point that they discover such bugs.
Depends on: 1099390
Flags: needinfo?(wachen) → needinfo?(bbajaj)
(In reply to Walter Chen[:ypwalter][:wachen] from comment #15)
> Just mummuring..
> We need people look into this bug...before v2.1 cc. I am amazed by the
> slowness of processing MTBF bugs comparing to CAF's bugs. They will finally
> get to a point that they discover such bugs.
I wonder if this is because we set the wrong component in this bug.
Yes, this bug causes the stability problem. However, if we identify the problem happens in Camera app,
should we change component to *Gaia::Camera* from *stability*? Then we can get owner's attention.
It's a media server issue as we suspected. This would affect camera, video recording, photo, video, ... This bug would serve as a meta-like bug for tracking this issue. We suspect this to be camera issue, but before we jump into conclusion, we might need others to further investigate it.
(In reply to Walter Chen[:ypwalter][:wachen] from comment #17)
> It's a media server issue as we suspected. This would affect camera, video
> recording, photo, video, ... This bug would serve as a meta-like bug for
> tracking this issue. We suspect this to be camera issue, but before we jump
> into conclusion, we might need others to further investigate it.

Mike,

Can you check the logs and see if there is anything else that needs to be fixed, other than media server issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1097835 that you and Diego are working on. 

Thanks
Hema
Flags: needinfo?(hkoka) → needinfo?(mhabicher)
Looking through the logs attached in comment 10, I see no occurrences of the error in comment 0:

W/CameraService(15312): CameraService::connect X (pid 31972) rejected (existing client).

I do, however, see the following Gaia error:

logcat186:167:11-11 02:06:02.542 32608 32608 E GeckoConsole: [JavaScript Error: "TypeError: t is undefined" {file: "app://camera.gaiamobile.org/js/controllers/storage.js" line: 1}]

Not sure if that's related.

Walter, can you repeat the MTBF test with the following line added to /system/bin/b2g.sh, right before the line that launches b2g?

export NSPR_LOG_MODULES=Camera:4

This will enable logging in the camera, which should provide mode information on the cause of the failure.
Flags: needinfo?(mhabicher) → needinfo?(wachen)
Looking at storage.js[1], I don't see where the error in comment 19 is coming from. Diego, any ideas?

1. https://github.com/mozilla-b2g/gaia/blob/v2.1/apps/camera/js/controllers/storage.js
(In reply to Mike Habicher [:mikeh] from comment #19)
> Looking through the logs attached in comment 10, I see no occurrences of the
> error in comment 0:
> 
> W/CameraService(15312): CameraService::connect X (pid 31972) rejected
> (existing client).
> 
> I do, however, see the following Gaia error:
> 
> logcat186:167:11-11 02:06:02.542 32608 32608 E GeckoConsole: [JavaScript
> Error: "TypeError: t is undefined" {file:
> "app://camera.gaiamobile.org/js/controllers/storage.js" line: 1}]
> 
> Not sure if that's related.
> 
> Walter, can you repeat the MTBF test with the following line added to
> /system/bin/b2g.sh, right before the line that launches b2g?
> 
> export NSPR_LOG_MODULES=Camera:4
> 
> This will enable logging in the camera, which should provide mode
> information on the cause of the failure.

Walter, please provide the logs with this change? Mikeh is out for almost a week starting tomorrow (21st). Please Ni aosmond in his absence for re-checking logs. 

Thanks
HEma
(In reply to Mike Habicher [:mikeh] from comment #20)
> Looking at storage.js[1], I don't see where the error in comment 19 is
> coming from. Diego, any ideas?
> 
> 1.
> https://github.com/mozilla-b2g/gaia/blob/v2.1/apps/camera/js/controllers/
> storage.js

The log points to the uglified version of storage.js. I need an STR to investigate further. Who can provide one? Can anybody reproduce the storage error again on with a non uglified version of camera? (Provide gaia version used)

make install-gaia APP=camera GAIA_OPTIMIZE=0
Flags: needinfo?(dmarcos) → needinfo?(mhabicher)
How does one run these MTBF tests, anyway?

Another factor confounding any investigation is that the sequential logcats seem to be spaced about 3 minutes apart. Given that the logcats (admittedly, I only looked at a sample) seem to be less than 2 minutes long, suggests that we're missing >50% of the logcat output. (This also assumes a uniform logcat output, which may be an invalid assumption.)

Hopefully the extra camera logging can shine some light on what's going on. The current output isn't very usable. How can we improve it?
(In reply to Mike Habicher [:mikeh] from comment #23)
> How does one run these MTBF tests, anyway?

So here's more details on the set-up and the exact test suite : https://wiki.mozilla.org/QA/Automation/Projects/MTBF
The MTBF team run the tests in https://github.com/Mozilla-TWQA/MTBF-Driver for crazy hours on multiple devices with a goal of hitting 100hrs of stability without crashing (walter should be able to give more details here) I believe there should be a to remotely connect to the affected device or run this tests on your own, but walter can confirm that.


> 
> Another factor confounding any investigation is that the sequential logcats
> seem to be spaced about 3 minutes apart. Given that the logcats (admittedly,
> I only looked at a sample) seem to be less than 2 minutes long, suggests
> that we're missing >50% of the logcat output. (This also assumes a uniform
> logcat output, which may be an invalid assumption.)
> 
> Hopefully the extra camera logging can shine some light on what's going on.
> The current output isn't very usable. How can we improve it?


walter, it'll be good to understand  a) at what point in your MTBF testing are you hitting this scenario,(is it at 10mins or 10 hours ?) b) Can you confirm you are seeing it in the latest testrun given https://bugzilla.mozilla.org/show_bug.cgi?id=1093797 has been fixed c)Can you provide the extra logging info that Mike is requesting ?
Flags: needinfo?(bbajaj)
Diego, Mike,
For mtbf setup please check
https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/MTBF_tests

You need to re-install latest gaiatest in gaia/tests/python/gaia-ui-test/ by "python setup install"
Specifying variable MTBF_CONF=conf/flame_v2.1.json and MTBF_TIME=50h (50 hours) to make sure it can be reproducible.
wachen, when reproducing this issue, please also use a de-optimized camera app as per comment 22.
Flags: needinfo?(mhabicher)
Hi, Bhavana,

a) not sure, like few hours after
b) it reproduce all the time (100%) since I reported this issue. again, it is 100% reproducible.
c) extra log in comment 3 and 10

Hi, Mike,

Please do try to clear request and ni? me again next time. First of all, you can try to do that yourself. It will be far faster than asking me to do that. Second, I was busy applying like 5 different patches to solve other critical issues. If you have difficulties setting up them, I will try to set up it and run later.
Flags: needinfo?(wachen)
Hi! Mike, Walter,

Since no update for a long while, is this case still an issue?

--
Keven
Flags: needinfo?(wachen)
Flags: needinfo?(mlien)
I haven't seem these for a long time in v2.2. As far as I remember, it doesn't reproduce anymore in v2.1, either. I think the patches on the depending bugs works.

Thanks.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(wachen)
Resolution: --- → FIXED
Flags: needinfo?(mlien)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: