Last Comment Bug 784829 - Camera not working anymore
: Camera not working anymore
Status: VERIFIED FIXED
[qa!], [getUserMedia], [blocking-gum+]
:
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: -- normal (vote)
: mozilla18
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
: Jason Smith [:jsmith]
Mentors:
: 786081 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-22 14:55 PDT by Alexandre LISSY :gerard-majax
Modified: 2012-09-05 04:10 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
patch (2.52 KB, patch)
2012-08-22 20:36 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
no flags Details | Diff | Review
Logcat with the patch applied (44.28 KB, patch)
2012-08-23 04:42 PDT, Alexandre LISSY :gerard-majax
no flags Details | Diff | Review
fix (1.70 KB, patch)
2012-08-23 17:03 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
rjesup: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Review
Detailed log dump (775.67 KB, application/x-gzip)
2012-08-27 10:51 PDT, Mike Habicher [:mikeh] (high bugzilla latency)
no flags Details

Description Alexandre LISSY :gerard-majax 2012-08-22 14:55:54 PDT
No more preview working in Camera app, no more camera working. Since roughly 21st, August.

Logcat (when broken) :
$ adb logcat | grep -i "camera viewfinder"
I/Gecko   (  241): camera viewfinder is unknown
I/Gecko   (  241): camera viewfinder is consuming

Logcat (when working) :
$ adb logcat | grep -i "camera viewfinder"
I/Gecko   (  345): camera viewfinder is consuming
I/Gecko   (  345): camera viewfinder is not consuming

bisecting with git gives the following commit:

---
8d75feebe8cafe014d553d6f7653a13c8db737fa is the first bad commit
commit 8d75feebe8cafe014d553d6f7653a13c8db737fa
Author: Robert O'Callahan <robert@ocallahan.org>
Date:   Wed Aug 1 00:17:21 2012 +1200

    Bug 779715. Part 1: Add basic support for ProcessedMediaStreams. r=jesup
    The main thing this patch does is to add ProcessedMediaStream objects and
    MediaInputPorts connecting streams. ProcessedMediaStreams are an abstract
    class that doesn't constrain what processing is actually performed, except
    that for now we assume a stream's processing depends only on its inputs
    window of data between mCurrentTime and mStateComputedTime.
    This patch reorganizes the way the blocking status of each stream is computed.
    The streams are partitioned into groups so that every stream which can affect
    the blocking status of another stream is in the same group as that other stream.
    We also add a pass to order the streams by dependency so we can process the streams
    in order of dependency; this pass also identifies the streams that form part of a
    cycle.
    
    --HG--
    extra : rebase_source : c45c931a264e73f295642a934500bbeaa6448774

:040000 040000 9ec2cc6f184b9b2a14ca93317c14fc2775ca00b4 0cd45ea944fa09c23f9741c54bec158e9c69cfa9 M	content
Comment 1 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-22 20:33:57 PDT
(In reply to Alexandre LISSY from comment #0)
> $ adb logcat | grep -i "camera viewfinder"
> I/Gecko   (  241): camera viewfinder is unknown
> I/Gecko   (  241): camera viewfinder is consuming

That's weird. "unknown" seems wrong, since the only caller of NotifyConsumptionChanged is MediaStreamGraphImpl::UpdateConsumptionState, which passes
  MediaStreamListener::Consumption state =
      aStream->mIsConsumed ? MediaStreamListener::CONSUMED
      : MediaStreamListener::NOT_CONSUMED;
So I would love to see a stack of how we hit "unknown" there.
Comment 2 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-22 20:36:08 PDT
Created attachment 654506 [details] [diff] [review]
patch

Alexandre, can you try this patch with NSPR_LOG_MODULES=MediaStreamGraph:5 in your environment? It might possibly fix your bug, but I don't expect it to, but the log data would be helpful. Also, if you can get a stack for us hitting http://dxr.mozilla.org/mozilla-central/dom/camera/CameraPreview.cpp.html#l44 that would be helpful. I have absolutely no idea how we can hit that.
Comment 3 Alexandre LISSY :gerard-majax 2012-08-23 02:30:26 PDT
I'll give it a try, but FYI, I'm not the only one hitting this.
Comment 4 Alexandre LISSY :gerard-majax 2012-08-23 03:02:49 PDT
First, backtrace from GDB (your patch not applied):
Reading symbols from /home/alex/codaz/B2G/objdir-gecko/dist/bin/b2g...done.
Remote debugging from host 127.0.0.1
syscall () at bionic/libc/arch-arm/bionic/syscall.S:50
50	    ldmfd   sp!, {r4, r5, r6, r7}
(gdb) break CameraPreview.cpp
CameraPreview.cpp  cpp                cpp_t              
(gdb) break CameraPreview.cpp:45
Breakpoint 1 at 0x4094049e: file /home/alex/codaz/B2G/gecko/dom/camera/CameraPreview.cpp, line 45.
(gdb) continue 
Continuing.
[New Thread 77.241]
[Switching to Thread 77.241]

Breakpoint 1, CameraPreviewListener::NotifyConsumptionChanged (this=0x29ec008, aGraph=<value optimized out>, aConsuming=113)
    at /home/alex/codaz/B2G/gecko/dom/camera/CameraPreview.cpp:45
45	        state = "unknown";
(gdb) bt
#0  CameraPreviewListener::NotifyConsumptionChanged (this=0x29ec008, aGraph=<value optimized out>, aConsuming=113) at /home/alex/codaz/B2G/gecko/dom/camera/CameraPreview.cpp:45
#1  0x4094c1e6 in mozilla::MediaStreamGraphImpl::UpdateConsumptionState (this=0x20667c8, aStream=0x29ebee0) at /home/alex/codaz/B2G/gecko/content/media/MediaStreamGraph.cpp:572
#2  0x4094f8a4 in mozilla::MediaStreamGraphImpl::RunThread (this=0x20667c8) at /home/alex/codaz/B2G/gecko/content/media/MediaStreamGraph.cpp:1380
#3  0x4094faac in Run (this=<value optimized out>) at /home/alex/codaz/B2G/gecko/content/media/MediaStreamGraph.cpp:1512
#4  0x40c837fa in nsThread::ProcessNextEvent (this=0x2066898, mayWait=<value optimized out>, result=0x4ca65eb7) at /home/alex/codaz/B2G/gecko/xpcom/threads/nsThread.cpp:624
#5  0x40c63f06 in NS_ProcessNextEvent_P (thread=0xa0, mayWait=true) at /home/alex/codaz/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:220
#6  0x40c83c28 in nsThread::ThreadFunc (arg=<value optimized out>) at /home/alex/codaz/B2G/gecko/xpcom/threads/nsThread.cpp:257
#7  0x401909b4 in _pt_root (arg=<value optimized out>) at /home/alex/codaz/B2G/gecko/nsprpub/pr/src/pthreads/ptthread.c:156
#8  0x4004fc28 in __thread_entry (func=0x40190955 <_pt_root>, arg=0x20bc920, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#9  0x4004f77c in pthread_create (thread_out=<value optimized out>, attr=0xbe86c6b4, start_routine=0x40190955 <_pt_root>, arg=0x20bc920) at bionic/libc/bionic/pthread.c:357
#10 0x00000000 in ?? ()
Comment 5 Alexandre LISSY :gerard-majax 2012-08-23 04:42:39 PDT
Created attachment 654576 [details] [diff] [review]
Logcat with the patch applied

With updated gecko (master at ef6746907b0a6a5d3160e1b26272ba38c5fdef73), and the proposed changes, the preview works again. Not sure if the patch is related, I'll retry without. In the meantime, please find the log with the changes applied and environment set accordingly.
Comment 6 Alexandre LISSY :gerard-majax 2012-08-23 04:43:55 PDT
Please also not that during this latest trial, I had no "unknown" status popping.
Comment 7 Alexandre LISSY :gerard-majax 2012-08-23 04:46:05 PDT
Build just finished, without your proposed patch and with master at ef6746907b0a6a5d3160e1b26272ba38c5fdef73, preview is broken, and I get "unknown" status in logcat.
Comment 8 Alexandre LISSY :gerard-majax 2012-08-23 05:10:07 PDT
I've played a bit with gdb, and adding a breakpoint in MediaStreamGraphImpl::UpdateConsumptionState() inside the "if (state != aStream->mLastConsumptionState)" is never being hit. Meaning state == aStream->mLastConsumptionState.
Comment 9 Alexandre LISSY :gerard-majax 2012-08-23 05:11:20 PDT
And I see nowhere in the sourcecode where aStream->mLastConsumptionState is changed.
Comment 10 Alexandre LISSY :gerard-majax 2012-08-23 05:14:11 PDT
Also, playing with breakpoints, I saw that both "aStream->mIsConsumed = true;" in MediaStreamGraphImpl::DetermineWhetherStreamIsConsumed gets called.
Comment 11 Alexandre LISSY :gerard-majax 2012-08-23 05:19:03 PDT
Issue is fixed by only applying this part of your patch:

@@ -940,16 +939,17 @@ void
 MediaStreamGraphImpl::UpdateStreamOrder()
 {
   nsTArray<nsRefPtr<MediaStream> > oldStreams;
   oldStreams.SwapElements(mStreams);
   for (PRUint32 i = 0; i < oldStreams.Length(); ++i) {
     MediaStream* stream = oldStreams[i];
     stream->mHasBeenOrdered = false;
     stream->mKnowIsConsumed = false;
+    stream->mIsConsumed = false;
     stream->mIsOnOrderingStack = false;
     stream->mInBlockingSet = false;
     ProcessedMediaStream* ps = stream->AsProcessedStream();
     if (ps) {
       ps->mInCycle = false;
     }   
   }
Comment 12 Mike Habicher [:mikeh] (high bugzilla latency) 2012-08-23 10:52:18 PDT
(In reply to Alexandre LISSY from comment #11)
> Issue is fixed by only applying this part of your patch:
> 
> @@ -940,16 +939,17 @@ void
>  MediaStreamGraphImpl::UpdateStreamOrder()
>  {
>    nsTArray<nsRefPtr<MediaStream> > oldStreams;
>    oldStreams.SwapElements(mStreams);
>    for (PRUint32 i = 0; i < oldStreams.Length(); ++i) {
>      MediaStream* stream = oldStreams[i];
>      stream->mHasBeenOrdered = false;
>      stream->mKnowIsConsumed = false;
> +    stream->mIsConsumed = false;
>      stream->mIsOnOrderingStack = false;
>      stream->mInBlockingSet = false;
>      ProcessedMediaStream* ps = stream->AsProcessedStream();
>      if (ps) {
>        ps->mInCycle = false;
>      }   
>    }

I can confirm that this patch fixes a lot of problems (previews shutdown properly and objects are properly collected when the camera is closed); but I'm still seeing the preview stall eventually--all of the GraphicsLockedBuffers seem to be stuck in the DEQUEUED or RENDERING state.
Comment 13 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-23 16:57:28 PDT
(In reply to Alexandre LISSY from comment #4)
> (gdb) bt
> #0  CameraPreviewListener::NotifyConsumptionChanged (this=0x29ec008,
> aGraph=<value optimized out>, aConsuming=113) at
> /home/alex/codaz/B2G/gecko/dom/camera/CameraPreview.cpp:45
> #1  0x4094c1e6 in mozilla::MediaStreamGraphImpl::UpdateConsumptionState
> (this=0x20667c8, aStream=0x29ebee0) at
> /home/alex/codaz/B2G/gecko/content/media/MediaStreamGraph.cpp:572

I guess the compiler is optimizing mIsConsumed ? CONSUMED : NOT_CONSUMED away to nothing since the only valid values for mIsConsumed are 0 and 1, so uninitialized and invalid mIsConsumed is passed through.

Thanks a ton for your help.
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-23 17:03:19 PDT
Created attachment 654852 [details] [diff] [review]
fix
Comment 15 Mike Habicher [:mikeh] (high bugzilla latency) 2012-08-23 21:53:00 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> (In reply to Alexandre LISSY from comment #4)
> > (gdb) bt
> > #0  CameraPreviewListener::NotifyConsumptionChanged (this=0x29ec008,
> > aGraph=<value optimized out>, aConsuming=113) at
> > /home/alex/codaz/B2G/gecko/dom/camera/CameraPreview.cpp:45
> > #1  0x4094c1e6 in mozilla::MediaStreamGraphImpl::UpdateConsumptionState
> > (this=0x20667c8, aStream=0x29ebee0) at
> > /home/alex/codaz/B2G/gecko/content/media/MediaStreamGraph.cpp:572
> 
> I guess the compiler is optimizing mIsConsumed ? CONSUMED : NOT_CONSUMED
> away to nothing since the only valid values for mIsConsumed are 0 and 1, so
> uninitialized and invalid mIsConsumed is passed through.
> 
> Thanks a ton for your help.

Unfortunately even with this patch, the preview still freezes.

I've been instrumenting a lot of the code, and from what I can see, it looks like something gets confused in an upper layer.  The camera is blocked because there are no free buffers--all of the buffers are either DEQUEUED or RENDERING, which means they're either already in the pipe to be filled, or displayed, respectively.  A call to the GonkIOSurfaceImage dtor will mark the buffers FREE, but that never happens.  Meanwhile the media thread keeps on chugging (it sure does generate a lot of output. :)

I have a _very_ detailed log that shows what's going on, but it's 15MiB.

It prompts two questions:
1. Why is the camera holding onto so many buffers?
2. Why is media stream/renderer not consuming and releasing the frames?

This pastie shows a backtrace of what should be happening: http://pastebin.com/DieGstPZ
Comment 16 Mike Habicher [:mikeh] (high bugzilla latency) 2012-08-27 10:51:49 PDT
Created attachment 655645 [details]
Detailed log dump

Shows:
1. open camera app, which opens preview successfully
2. takes picture
3. restarts preview successfully
4. takes pictures
5. preview fails to restart
Comment 17 Mike Habicher [:mikeh] (high bugzilla latency) 2012-08-27 11:45:54 PDT
I've confirmed that if ReceiveFrame() is replaced with a stub that just immediately frees the preview data buffer, the preview never freezes.
Comment 18 Dietrich Ayala (:dietrich) 2012-08-28 20:37:59 PDT
The patch on this bug is waiting for 6 days now for review. Is it not the right approach, or still valid fix? Does something else need to happen per Mike's previous comments?
Comment 19 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-28 20:43:56 PDT
That patch needs to land. I just nagged Jesup.
Comment 20 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-28 20:45:18 PDT
(In reply to Mike Habicher [:mikeh] from comment #16)
> Shows:
> 1. open camera app, which opens preview successfully
> 2. takes picture
> 3. restarts preview successfully
> 4. takes pictures
> 5. preview fails to restart

In that log, only three frames get written to the final stream 4538c3d0. Are we really expecting to have only three frames available? Have we failed to free frames from the previous stream(s)?
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-28 21:32:45 PDT
We remove media stream 4538d920 from the graph only after 50 seconds, which is after the hang and long after we stopped using it. So maybe it (and 4655c240) have kept some Images alive. They probably aren't removed from the graph until cycle collection (or at least GC) runs, and maybe they aren't running very often here. Not sure yet if that's the problem, although since each stream holds onto at least the last Image, it's not good.

I see that at about 1.8 seconds (right after "Updating current time to ... real 1.752319") we suddenly allocate all available buffers (according to your "Buffer [0]: state DEQUEUED" logs, before time 1.806122. Why is that? That's less than 50ms so you should only get a couple of frames in that time?
Comment 22 Alexandre LISSY :gerard-majax 2012-08-28 23:10:39 PDT
With patch from bug 781892, I've been able to go up to 10 photos before getting killed (OOM probably)
Comment 23 Randell Jesup [:jesup] 2012-08-28 23:47:51 PDT
Comment on attachment 654852 [details] [diff] [review]
fix

Review of attachment 654852 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay; I got swamped in bugmail by some QA add-myself-to-bugs and triage-generated mails, and haven't dug out, and missed this one
Comment 24 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-29 04:27:37 PDT
I think we should consider this bug fixed by this patch. The problem with playback stalling is a different problem that deserves its own bug.
Comment 25 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-29 04:28:09 PDT
Comment on attachment 654852 [details] [diff] [review]
fix

Review of attachment 654852 [details] [diff] [review]:
-----------------------------------------------------------------

Fixes a serious regression introduced before the Aurora uplift.
Comment 26 Alexandre LISSY :gerard-majax 2012-08-29 05:37:04 PDT
Yep, the freeze is being tracked down in bug 781892.
Comment 27 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-29 15:21:45 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25)
> Comment on attachment 654852 [details] [diff] [review]
> fix
> 
> Review of attachment 654852 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Fixes a serious regression introduced before the Aurora uplift.

I don't see a comment for this landing on central yet, will wait for that to approve aurora uplift.
Comment 28 Ryan VanderMeulen [:RyanVM] 2012-08-29 17:21:34 PDT
https://hg.mozilla.org/mozilla-central/rev/3d7e89acf4c2
Comment 29 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-08-29 17:57:33 PDT
*** Bug 786081 has been marked as a duplicate of this bug. ***
Comment 30 Jason Smith [:jsmith] 2012-08-30 16:30:52 PDT
Verified on 8/30 Nightly build.
Comment 31 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-09-05 04:10:20 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/f9cafaf42af2

Note You need to log in before you can comment on or make changes to this bug.