[V1.4][Camera][Dolphin] Repeatedly tap video record button causes app crash

VERIFIED FIXED in Firefox 32

Status

()

defect
--
critical
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: whsu, Assigned: sotaro, NeedInfo)

Tracking

({crash})

unspecified
mozilla33
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:1.4+, firefox31 wontfix, firefox32 fixed, firefox33 fixed, b2g-v1.4 fixed, b2g-v2.0 verified, b2g-v2.1 verified)

Details

(Whiteboard: [caf-crash 277][caf priority: p1][CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1], crash signature)

Attachments

(5 attachments)

* Description:
  This problem happens on v1.4 build.
  Repeatedly tap video record button causes app crash.
  Flame and Dolphin device can reproduce it.

  ~ Crash Report: Flame
    - https://crash-stats.mozilla.com/report/index/05bccc3d-96a0-4eee-801c-86cb72140708
    - https://crash-stats.mozilla.com/report/index/ce979230-1511-4684-bacd-7076d2140708
  ~ Crash Report: Dolphin
    - https://crash-stats.mozilla.com/report/index/2a368529-0a8e-4c0f-a793-614eb2140708

* Reproduction steps:
  1. Launch camera app
  2. Switch to video record mode
  3. Repeatedly tap video record button

* Reproduction rate:
  - 20%~30%

* Expected result:
  You can repeatedly tap video record button. No crash happened.

* Actual result:
  App crash happened.

* Reproduction build:
  @ Dolphin - V1.4
    - Gaia      f2c118767aa26981386570e5d0bed95bab593de5
    - Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/cb7083a89b04
    - BuildID   20140707160203
    - Version   30.0
    
  @ Flame - V1.4
    - Gaia      318f8d814d8930b5530bcb0badc1bb5bd0b5ef45
    - Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/ef78ebf203e5
    - BuildID   20140708000310
    - Version   30.0
Unable to reproduce on full-memory Flame running:

Gonk      v122
Gaia      f71bf6ee534275a44ae3e475e9e2a5dfeebbc0ae
Gecko     https://hg.mozilla.org/integration/b2g-inbound/rev/e2402e271fb6
BuildID   20140709032231
Version   33.0a1
ro.build.version.incremental=108
ro.build.date=Tue Jun 10 19:40:40 CST 2014
Unable to reproduce on full-memory Flame running:

Gonk      v122
Gaia      b0e9b4bdb39c5eb93a6783a34624ffc84f62b126
Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/acf704e54e19
BuildID   20140709000201
Version   30.0
ro.build.version.incremental=108
QA Wanted to double check STR & reproduction rate.
Severity: normal → critical
Keywords: crash, qawanted
Whiteboard: [b2g-crash]
I was only able to reproduce this issue once with the above steps. Further repro was attempted for almost an hour with no more crashes. The device was reset once, and reflashed once. 


Flame 1.4
BuildID: 20140708000310
Gaia: 318f8d814d8930b5530bcb0badc1bb5bd0b5ef45
Gecko: ef78ebf203e5
Version: 30.0 (1.4) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

Actual:
The Camera app crashed and the User was given a crash report to send.

Repro Rate: 1/10 (Once in an hour)
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
QA Contact: ekramer
QA-Wanted Report: Not nomming, really low repro rate, possible fringe case
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
spreadtrum get this crash again with monkey test. 4 of their 5 devices have this problem.
Whiteboard: [b2g-crash] → [sprd331654][partner-blocker][b2g-crash]
It might be memory related issue. You can use low-memory flame to do the test.
Set memory = 273 MB. Thanks!
blocking-b2g: --- → 1.4?
I found a way to consistently reproduce this:
------------------------------------------------------------------------
1. Launch Camera
2. Switch to video mode
3. Tap on start taking video
4. Rapidly tap on stopping taking video for 2~3 times
   --> Camera crash.
BTW, the test device for me is dolphin.
With the following build on a Flame configured with a 512MiB memory limit:

Gonk      v122
Gaia      b0e9b4bdb39c5eb93a6783a34624ffc84f62b126
Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/ccabaf8826a4
BuildID   20140709160208
Version   30.0
ro.build.version.incremental=108

...the STR in comment 8 do not cause the Camera app to crash; video recording stops properly.

If the memory limit is dropped to 273MiB, starting recording causes the Camera app (and just about everything else) to get killed:

<6>[   24.231973] send sigkill to 959 ((Preallocated a), adj 667, size 1997
<6>[   49.504431] send sigkill to 1235 ((Preallocated a), adj 667, size 2666
<6>[   55.053535] send sigkill to 1036 (Homescreen), adj 534, size 1416
<6>[   68.775487] send sigkill to 1096 (Camera), adj 134, size 2445

This is the same issue as in bug 1027592.

pcheng, please repeat your STR on Dolphin and run the following command afterwards:

# adb shell dmesg | grep sigkill
Flags: needinfo?(pcheng)
I just checked but didn't get anything from this command "adb shell dmesg | grep sigkill".
Flags: needinfo?(pcheng)
I will get and attach here.
Posted file Camera.log
test starts: 07-10 12:03:34.780
1.4+ please.
Flags: needinfo?(wchang)
>   ~ Crash Report: Flame
>     -
> https://crash-stats.mozilla.com/report/index/05bccc3d-96a0-4eee-801c-
> 86cb72140708
>     -
> https://crash-stats.mozilla.com/report/index/ce979230-1511-4684-bacd-
> 7076d2140708

I encountered these crashes on 1GB flame. FYI
Seems like a common issue according to comments.

Bhavana, please also see if 2.0 needs this.
blocking-b2g: 1.4? → 1.4+
Flags: needinfo?(wchang) → needinfo?(bbajaj)
Posted file camera-crash.txt
Following the repo steps in Comment 8, I always got the below messages.

I/GeckoDump( 1105): Crash reporter : Can't fetch app.reportCrashes. Exception: [Exception... "Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIPrefBranch.getBoolPref]"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: chrome://b2g/content/shell.js :: shell_reportCrash :: line 144"  data: no]
E/GeckoConsole( 1144): Could not read chrome manifest 'file:///system/b2g/chrome.manifest'.

The detailed logcat was attached.
From issue tracking, the symptom of this bug seems like I saw in bug 1016805 comment 7. In my guess, we should check this with bug 1016805.
Flags: needinfo?(sotaro.ikeda.g)
Assignee: nobody → jdarcangelo
I can confirm this crash using the STR in comment 8, running:
- gecko: b2g30_v1_4:896fa800b72d
- gaia: v1.4:e273c1f52ed7187e4e0b2d66ed5718f0f20c6eeb
Assignee: jdarcangelo → mhabicher
(In reply to Mike Habicher [:mikeh] from comment #20)
> I can confirm this crash using the STR in comment 8, running:
> - gecko: b2g30_v1_4:896fa800b72d
> - gaia: v1.4:e273c1f52ed7187e4e0b2d66ed5718f0f20c6eeb

Just to be clear, this build was running on a Dolphin. (Still) unable to reproduce on a Flame.
Summary: [V1.4][Camera][Flame/Dolphin] Repeatedly tap video record button causes app crash → [V1.4][Camera][Dolphin] Repeatedly tap video record button causes app crash
(In reply to Vincent Liu[:vliu] from comment #19)
> From issue tracking, the symptom of this bug seems like I saw in bug 1016805
> comment 7. In my guess, we should check this with bug 1016805.

This seems similar to bug 1016805. But it is not clear where it happens. It could be a similar place or could be a different place.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to pcheng from comment #14)
> Created attachment 8453539 [details]
> Camera.log
> 
> test starts: 07-10 12:03:34.780

From the following log, the problem seems to happen during thumbnail generation.
-------------------------------------

07-10 12:04:12.407 V/GonkBufferQueue(  692): cancelBuffer: slot=3
07-10 12:04:12.407 V/OMXCodec(  692): [OMX.sprd.h264.decoder] freeing buffer 0xb8a926d0 on port 1
07-10 12:04:12.407 V/OMXCodec(  692): [OMX.sprd.h264.decoder] Calling cancelBuffer on buffer 0xb8a926d0
07-10 12:04:12.407 V/GonkBufferQueue(  692): cancelBuffer: slot=2
07-10 12:04:12.407 V/OMXCodec(  692): [OMX.sprd.h264.decoder] freeing buffer 0xb8a9f750 on port 1
07-10 12:04:12.407 V/OMXCodec(  692): [OMX.sprd.h264.decoder] Calling cancelBuffer on buffer 0xb8a9f750
07-10 12:04:12.407 V/GonkBufferQueue(  692): cancelBuffer: slot=1
07-10 12:04:12.407 F/OMXCodec(  692): frameworks/av/media/libstagefright/OMXCodec.cpp:2793 CHECK(info->mStatus == OWNED_BY_US || info->mStatus == OWNED_BY_NATIVE_WINDOW) failed.
(gdb) bt full
#0  tgkill () at bionic/libc/arch-arm/bionic/tgkill.S:46
No locals.
#1  0xb6eb804c in pthread_kill (t=<optimized out>, sig=6) at bionic/libc/bionic/pthread_kill.cpp:49
        thread = <optimized out>
        tid = 1958
        rc = <optimized out>
#2  0xb6eb8260 in raise (sig=6) at bionic/libc/bionic/raise.cpp:32
        rc = <optimized out>
#3  0xb6eb6f94 in __libc_android_abort () at bionic/libc/bionic/abort.cpp:55
        mask = 3003093504
        sa = {_u = {_sa_handler = 0x1, _sa_sigaction = 0x1}, sa_mask = 2628304075, sa_flags = 9, sa_restorer = 0x9ca8bccb}
#4  0xb6ec6840 in abort () at bionic/libc/arch-arm/bionic/abort_arm.S:41
No locals.
#5  0xb4fc078e in __android_log_assert (cond=<optimized out>, tag=0xb50cf9cb "OMXCodec", fmt=0xb50bcb87 "%s") at system/core/liblog/logd_write.c:261
        buf = "frameworks/av/media/libstagefright/OMXCodec.cpp:2793 CHECK(info->mStatus == OWNED_BY_US || info->mStatus == OWNED_BY_NATIVE_WINDOW) failed.\000\214\027\240\263\000\000\000\000\204\063\357\266\305\004\374\264\000@\000\000\002\000\000\000e6b0,\027\240\263\001\000\000\000\313\371\f\265\t\000\000\000\214\027\240\263{\000\000\000mp 441311"...
#6  0xb5072122 in android::OMXCodec::allocateOutputBuffersFromNativeWindow (this=<optimized out>) at frameworks/av/media/libstagefright/OMXCodec.cpp:1908
No locals.
#7  0xb507536c in android::OMXCodec::start (this=0x2cf, meta=0xb2ff9200) at frameworks/av/media/libstagefright/OMXCodec.cpp:3738
        autoLock = {mLock = @0x0}
        params = {m_ptr = 0xb4bba0c0 <android::MediaMetadataRetriever::sServiceLock>}
        err = 479
#8  0x00000000 in ?? ()
Looking through the code with reference to comment 24, I think the stacktrace leading to the failing CHECK() is:

OMXCodec::start()
  allocateOutputBuffersFromNativeWindow()
    freeBuffersOnPort()
      CHECK()
        __android_log_assert()

allocateOutputBuffersFromNativeWindow() is probably failing on a call to one of allocateBuffer(), allocateBufferWithBackup(), or useBuffer() inside the for-loop, and hitting the on-error clean-up path ~line 1654. This calls freeBuffersOnPort(), which iterates over the buffers on a specified port.

This matches what sotaro called out in comment 23.
The offending BufferInfo::mStatus value is:

07-10 15:11:08.688   636   717 E OMXCodec: [OMX.sprd.h264.decoder] mikeh: info->mStatus = 3
07-10 15:11:08.688   636   717 F OMXCodec: frameworks/av/media/libstagefright/OMXCodec.cpp:2794 CHECK(info->mStatus == OWNED_BY_US || info->mStatus == OWNED_BY_NATIVE_WINDOW) failed.

...which means OWNED_BY_CLIENT[1].

1. http://androidxref.com/4.1.1/xref/frameworks/av/include/media/stagefright/OMXCodec.h#155
Another occurrence, different execution path:

(gdb) bt
#0  tgkill () at bionic/libc/arch-arm/bionic/tgkill.S:46
#1  0xb6e6804c in pthread_kill (t=<optimized out>, sig=6) at bionic/libc/bionic/pthread_kill.cpp:49
#2  0xb6e68260 in raise (sig=6) at bionic/libc/bionic/raise.cpp:32
#3  0xb6e66f94 in __libc_android_abort () at bionic/libc/bionic/abort.cpp:55
#4  0xb6e76840 in abort () at bionic/libc/arch-arm/bionic/abort_arm.S:41
#5  0xb4f7178e in __android_log_assert (cond=<optimized out>, tag=0xb508044b "OMXCodec", fmt=0xb506d56f "%s") at system/core/liblog/logd_write.c:261
#6  0xb5023554 in android::OMXCodec::freeBuffersOnPort (this=this@entry=0xb34df3e0, portIndex=portIndex@entry=1, onlyThoseWeOwn=onlyThoseWeOwn@entry=false) at frameworks/av/media/libstagefright/OMXCodec.cpp:2806
#7  0xb502385a in android::OMXCodec::onStateChange (this=this@entry=0xb34df3e0, newState=newState@entry=OMX_StateIdle) at frameworks/av/media/libstagefright/OMXCodec.cpp:2705
#8  0xb5025310 in android::OMXCodec::onCmdComplete (this=this@entry=0xb34df3e0, cmd=cmd@entry=OMX_CommandStateSet, data=data@entry=2) at frameworks/av/media/libstagefright/OMXCodec.cpp:2525
#9  0xb5025f50 in android::OMXCodec::onEvent (this=this@entry=0xb34df3e0, event=<optimized out>, data1=0, data2=2) at frameworks/av/media/libstagefright/OMXCodec.cpp:2425
#10 0xb5026128 in android::OMXCodec::on_message (this=0xb34df3e0, msg=...) at frameworks/av/media/libstagefright/OMXCodec.cpp:2141
#11 0xb50266e8 in android::OMXCodecObserver::onMessage (this=0xb0870de0, msg=...) at frameworks/av/media/libstagefright/OMXCodec.cpp:111
#12 0xb4b3ce5a in android::BpOMX::emptyBuffer (this=0xb0870de0, node=0xb3901ca4, buffer=0x19, range_offset=3012566180, range_length=3012567404, flags=108, timestamp=463856468076) at frameworks/av/media/libmedia/IOMX.cpp:445
#13 0x00000000 in ?? ()
Assignee: mhabicher → sotaro.ikeda.g
By analyzing the problem, this bug is a regression of Bug 1029719 :-(

RemoveTextureFromCompositableTracker::ReleaseTextureClient() creates  TextureClientReleaseTask and post to ISurfaceAllocator's message loop. It changed OMXCodec's shutdown sequence. OMXCodec's shutdown sequence has a strict restriction to the order. Bug 1029719 broke the order.
The patch fix the problem for me on v1.4 dolphin.
Attachment #8454567 - Flags: review?(nical.bugzilla)
Status: NEW → ASSIGNED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Comment on attachment 8454567 [details] [diff] [review]
patch - Fix RemoveTextureFromCompositableTracker::ReleaseTextureClient()

Review of attachment 8454567 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/CompositableClient.cpp
@@ +96,5 @@
>      RefPtr<ISurfaceAllocator> allocator = mTextureClient->GetAllocator();
>      mTextureClient = nullptr;
>      allocator->GetMessageLoop()->PostTask(FROM_HERE, task);
> +  } else {
> +    mTextureClient = nullptr;

Looks like mTextureClient = nullptr will be executed no matter if the if clause is valid or not. (Line 100 and line 97). In this case, why don't we remove the else clause, and move mTextureClient = nullptr; to the last line of ReleaseTextureClient() before it exits? Then, the codes will look clean and better. What do you think?
Duplicate of this bug: 1019422
Crash Signature: __libc_android_abort | __write_to_log_init
Whiteboard: [sprd331654][partner-blocker][b2g-crash] → [sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1]
(In reply to Kevin Hu [:khu] from comment #30)
> Comment on attachment 8454567 [details] [diff] [review]
> patch - Fix RemoveTextureFromCompositableTracker::ReleaseTextureClient()
> 
> Review of attachment 8454567 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/client/CompositableClient.cpp
> @@ +96,5 @@
> >      RefPtr<ISurfaceAllocator> allocator = mTextureClient->GetAllocator();
> >      mTextureClient = nullptr;
> >      allocator->GetMessageLoop()->PostTask(FROM_HERE, task);
> > +  } else {
> > +    mTextureClient = nullptr;
> 
> Looks like mTextureClient = nullptr will be executed no matter if the if
> clause is valid or not. (Line 100 and line 97). In this case, why don't we
> remove the else clause, and move mTextureClient = nullptr; to the last line
> of ReleaseTextureClient() before it exits? Then, the codes will look clean
> and better. What do you think?

"mTextureClient = nullptr" can not move to the last line. It is needed to be called before "allocator->GetMessageLoop()->PostTask(FROM_HERE, task);".
Okay, thanks, Sotaro.
Component: Gaia::Camera → Graphics: Layers
Product: Firefox OS → Core
Milan, is Nicolas available for review? Thanks.
Flags: needinfo?(milan)
Flags: needinfo?(Dafeng.Xu)
Duplicate of this bug: 1038037
Attachment #8454567 - Flags: review?(nical.bugzilla) → review+
Flags: needinfo?(milan)
https://hg.mozilla.org/mozilla-central/rev/a894aaa5b7bc
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
there has a side effect Bug 1039937, camera performance will become much worse
Yes! As Mike mentioned, the crash cannot be reproduced but the camera's performance becomes worse.
Please solve performance problem that Bug 1039937 mentioned.
Thanks!

* Build information:
 - Gaia      e0a732cf172d125eadfad59f0faf8f73b62012b8
 - Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/074223c547af
 - BuildID   20140718000231
 - Version   30.0
Status: RESOLVED → VERIFIED
Duplicate of this bug: 1038891
Whiteboard: [sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1] → [CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1]
Whiteboard: [CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1] → [caf priority: p1][CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1]
No longer blocks: CAF-v2.0-FC-metabug
I tried to add some logs to buri, flame and dolphin to compare the performance for taking picture. These devices are all run on v1.4 branch. The result of profiling data is attached. There are two items need to be discussed.

1. For measuring time of auto focus, Dolphin runs about 8 times than other devices. Please see column B for more detailed. 

2. For measuring time of taking picture, Dolphin sometimes takes above 2 seconds when the first launch after reboot.
(In reply to Vincent Liu[:vliu] from comment #43)
> Created attachment 8461358 [details]
> profiling_for_taking_picture.ods
> 
> I tried to add some logs to buri, flame and dolphin to compare the
> performance for taking picture. These devices are all run on v1.4 branch.
> The result of profiling data is attached. There are two items need to be
> discussed.
> 
> 1. For measuring time of auto focus, Dolphin runs about 8 times than other
> devices. Please see column B for more detailed. 
> 
> 2. For measuring time of taking picture, Dolphin sometimes takes above 2
> seconds when the first launch after reboot.

Sorry to post profiling data to wrong bug id.
Whiteboard: [caf priority: p1][CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1] → [caf-crash 277][caf priority: p1][CR 693945][sprd331654][partner-blocker][b2g-crash][1.4-dolphin-test-run-1]
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.026
Moz BuildID: 20140707000200
B2G Version: 1.4
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ef67af27dff3130d41a9139d6ae7ed640c34d922
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d3eae03cdf4e6944e646d05938a22dc1380a0d95
Posted video video
Following the repo steps in Comment 8, this issue has been verified successfully on Flame2.0 and 2.1
See attachment: Verify_1035755.MP4
Reproducing rate: 0/20

Flame2.0 build:
Gaia-Rev        8d1e868864c8a8f1e037685f0656d1da70d08c06
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/ff1100ba2ab8
Build-ID        20141204000228
Version         32.0

Flame2.1 build:
Gaia-Rev        5655269098c7e82254e56933f1af05b4abe2a2f3
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/86608c9389b5
Build-ID        20141204001201
Version         34.0
You need to log in before you can comment on or make changes to this bug.