Closed Bug 826829 Opened 7 years ago Closed 7 years ago

Crash when sending camera app to background: mozalloc_abort | pthread_mutex_unlock | ?? (mozalloc_abort() -- or -- abort() under libutils.so!android::RefBase::weakref_type::decWeak)

Categories

(Core :: DOM: Device Interfaces, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla21
blocking-b2g tef+
blocking-basecamp -
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 + verified
b2g18-v1.0.0 --- fixed

People

(Reporter: cjones, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash] [cr 437569], QARegressExclude)

Crash Data

Attachments

(2 files, 10 obsolete files)

1001 bytes, patch
Details | Diff | Splinter Review
3.84 KB, patch
Details | Diff | Splinter Review
Attached file Minidump + dumplookup (obsolete) —
+++ This bug was initially created as a clone of Bug #824224 +++

Seen during MW1 [1].

What happened was, after step 8, when pressing HOME to send the camera to the background, it crashed.

[1] https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW1:_Music_stays_alive
dumplookup is nice and juicy and useful.  Trimming what appears to be noise, I see

0x43761b14: libutils.so!android::RefBase::weakref_type::decWeak [RefBase.cpp : 412 + 0x1]
0x43761b24: libutils.so!android::RefBase::decStrong [RefBase.cpp : 363 + 0x1]
0x43761b44: libmemalloc.so!gralloc::PmemKernelAlloc::map_buffer [pmemalloc.cpp : 370 + 0x1]
0x43761b54: libutils.so!android::RefBase::weakref_type::decWeak [RefBase.cpp : 412 + 0x1]
0x43761bb4: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 1125 + 0x9]
0x43761bcc: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 1033 + 0x5]
0x43761be4: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 640 + 0xd]
0x43761c0c: libxul.so!mozilla::layers::SurfaceDescriptor::~SurfaceDescriptor [LayersSurfaces.cpp : 487 + 0x1]
0x43761c14: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 833 + 0xd]
0x43761c64: libxul.so!mozilla::layers::PImageContainerChild::FatalError [PImageContainerChild.cpp : 528 + 0x1]
0x43761c74: libxul.so!mozilla::layers::PImageContainerChild::OnMessageReceived [PImageContainerChild.cpp : 383 + 0x9]
0x43761cdc: libxul.so!mozilla::layers::PCompositorChild::OnMessageReceived [PCompositorChild.cpp : 637 + 0x9]
0x43761cec: libxul.so!mozilla::layers::PCompositorChild::OnMessageReceived [PCompositorChild.cpp : 630 + 0x1]
0x43761cfc: libxul.so!mozilla::ipc::AsyncChannel::OnDispatchMessage [AsyncChannel.cpp : 473 + 0xb]

This is in code that's trying to look up a PGrallocBufferChild.  I didn't have the patches from bug 824224 applied and I didn't look at the logcat, so I wasn't able to see the abort message.

If the decWeak() can be believed, then this could be a crash at

    const int32_t c = android_atomic_dec(&impl->mWeak);
    LOG_ASSERT(c >= 1, "decWeak called on %p too many times", this);

Alternatively, we could be crashing trying to look up a gralloc buffer actor.
blocking-basecamp: --- → ?
Component: DOM: IndexedDB → DOM: Device Interfaces
Crash Signature: [@ mozalloc_abort | pthread_mutex_unlock]
Keywords: crash
Whiteboard: [b2g-crash]
blocking-basecamp: ? → +
Summary: Crash: mozalloc_abort | pthread_mutex_unlock | ?? (abort() under libutils.so!android::RefBase::weakref_type::decWeak) → Crash when sending camera app to background: mozalloc_abort | pthread_mutex_unlock | ?? (mozalloc_abort() -- or -- abort() under libutils.so!android::RefBase::weakref_type::decWeak)
Assignee: nobody → sotaro.ikeda.g
I could easily reproduce the Camera app crash by repating following [1][2] on Unagi phone.

1. start Camera application. Wait until a camera preview becomes active. 
2. push home button

It happens when I repeat [1][2] about 4-5 times. It is a same crash, I think.
When camera app crashed, I always see following log.

I/Gecko   (  482): [Child 482] ###!!! ABORT: [PImageContainerChild] abort()ing as a result: file /home/sotaro/build-b2g/B2G/objdir-gecko/ipc/ipdl/PImageContainerChild.cpp, line 527
F/libc    (  482): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
I trid to use GDB to debug this. But it takes very long time to connect GDB in Camera app's case and failed to debug soon. Therefore, I debug this by adding a lot of log-prints.

From logouts, it seems that the abort happens like following sequence.
-[1] ImageContainerChild calls ImageContainerChild::SendPublishImage()
-[2] camera app stops preview
-[3] GonkNativeWindow free all Gralloc buffers
-[4] GrallocBufferActors are freed by ImageBridgeChild
-[5] ImageContainerParent receive ImageContainerParent::RecvPublishImage()
-[6] ImageContainerParent sends back a SharedImage that refers alredy deleted GrallocBufferActor
-[7] ImageContainerChild received the message from parent, but failed to get SurfaceDescriptorGralloc because it is alredy deleted
Following causes to abort

> -[6] ImageContainerParent sends back a SharedImage that refers alredy deleted GrallocBufferActor
There is a function"ImageContainerChild::SendFlush()" to remove SharedImage of parent side. But ImageContainerParent::RecvFlush() was receved after child side aborted when abort happened.
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> I could easily reproduce the Camera app crash by repating following [1][2]
> on Unagi phone.
> 
> 1. start Camera application. Wait until a camera preview becomes active. 
> 2. push home button
> 
> It happens when I repeat [1][2] about 4-5 times. It is a same crash, I think.

Yep, that's exactly this bug.
I got following information from Kan-Ru Chen
 - Rendering and pending buffers are kept by GonkNativeWindow

So, the problem should be prevented by it... I need to investigate more.
While recording videos and switching between Camera and Gallery I see the PImageContainerChild abort mentioned here but also:

[Child 2325] ###!!! ABORT: [PBlobStreamChild] abort()ing as a result: file /home/tim/workspace/b2g-desktop/objdir-gecko-unagi/ipc/ipdl/PBlobStreamChild.cpp, line 364

Is this related or should I file a new bug? I do have a backtrace which also contains the first frame from above (mozilla::ipc::AsyncChannel::OnDispatchMessage).
(In reply to Tim Taubert [:ttaubert] from comment #9)
> While recording videos and switching between Camera and Gallery I see the
> PImageContainerChild abort mentioned here but also:
> 
> [Child 2325] ###!!! ABORT: [PBlobStreamChild] abort()ing as a result: file
> /home/tim/workspace/b2g-desktop/objdir-gecko-unagi/ipc/ipdl/PBlobStreamChild.
> cpp, line 364
> 
> Is this related or should I file a new bug? I do have a backtrace which also
> contains the first frame from above
> (mozilla::ipc::AsyncChannel::OnDispatchMessage).

It seems like that it is a different bug. Can you file a new bug?
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> It seems like that it is a different bug. Can you file a new bug?

Yeah, the Gallery just crashed again in PBlobStreamChild.cpp while in the background right after I started recording. Doesn't seem related to me, will file a new bug.
I added more printout to confirm following.

> I got following information from Kan-Ru Chen
> - Rendering and pending buffers are kept by GonkNativeWindow

"BufferSlot::RENDERING" just prevents to delete a buffer in GonkNativeWindow::freeBufferLocked(int i).
The buffer is deleted after when CameraGraphicBuffer loses reference from others.

this log shows that.
one cause of abort was because of async ImageContainerChild::SendFlush(). This patch change ImageContainerChild::SendFlush() to synchronous.
Blocks: 827833
By applying attachment 699225 [details] [diff] [review] , I face following problem.
- Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state

It seems that attachment 699225 [details] [diff] [review] hit another defect.
get log by following command
> adb shell logcat -v threadtime > log.txt

dead lock happens like following sequence.
- [1] camera application stops preview
- [2] GonkNativeWindow frees all gralloc buffers by calling GonkNativeWindow::abandon() //camera thread
- [3] GonkNativeWindow require mutex lock. //camera thread
- [4] GonkNativeWindow::freeAllBuffersLocked() tries to free all gralloc buffers except rendering state buffers.  //camera thread
- [5] ImageBridgeChild::DeallocSurfaceDescriptorGralloc() is called to deallocate gralloc buffer.
- [6] DeallocSurfaceDescriptorGralloc() post a task of deallocate gralloc buffer to ImageBridge's thread
            and wait until the buffer is deleted
- [7] ImageBridge's thread processes some tasks on the thread.
          one of task is ImageContainerChild::SetIdleNow()
- [8] ImageContainerChild::SetIdleNow() delete gralloc buffer synchronously by change of attachment 699225 [details] [diff] [review]
       and frees some GonkIOSurfaceImage   // ImageBridge's thread
- [9] In GonkIOSurfaceImage::~GonkIOSurfaceImage(), CameraGraphicBuffer::Unlock() is called    // ImageBridge's thread
- [10] In CameraGraphicBuffer::Unlock(), GonkNativeWindow::returnBuffer() is called    // ImageBridge's thread
- [11] GonkNativeWindow::returnBuffer() tries to hold the mutex lock. But it is alrealy hold by [3] in different thread !!!!!!!!
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> By applying attachment 699225 [details] [diff] [review] , I face following
> problem.
> - Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state
> 
> It seems that attachment 699225 [details] [diff] [review] hit another defect.

The deadlock seems similar to Bug 826765
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> By applying attachment 699225 [details] [diff] [review] , I face following
> problem.
> - Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state
> 
> It seems that attachment 699225 [details] [diff] [review] hit another defect.

I locally confirmed that the deadlock could be prevented by following change.

- dispatch GonkNativeWindow::returnBuffer() to a different thread than ImageBridge's thread

Right now, I use main thread for it. It is better to assign dedicated thread for it, because performance is very important.
I found 2 patterns to the aborts. Attachment 699225 [details] [diff] tried to address [1]. It is better to address [2] in different bug. 

[1] abort during stopping preview
After calling ImageContainerChild::SendPublishImage(), ImageContainerChild calls ImageContainerChild::SendFlush() and frees child side Graloc buffers.
Then on parent side, ImageContainerParent::RecvPublishImage() is called and send back previous image. But it is already deallocated on child side.

[2] abort on re-starting preview
camera app stop preview correctly and try to start preview again. 
There is a Image(GonkIOSurfaceImage) from last preview. And the Image references to a CameraGraphicBuffer.
When prevew starts, ImageContainer::SetCurrentImage() is called.
And within it, ImageContainerChild::SendImageAsync() is called and mActiveImage is replaced to new Image.
When mActiveImage is replaced, there is no refence to the GonkIOSurfaceImage from previous preview and the the GonkIOSurfaceImage is deleted.
During the destruction of the GonkIOSurfaceImage, CameraGraphicBuffer::Unlock() is called and Gralloc buffer is deleted.
After that on parent side, ImageContainerParent::RecvPublishImage() is called and send back previous image. But it is already deallocated on child side.
Blocks: 828289
Attachment #699225 - Attachment description: change ImageContainerChild::SendFlush() to sync → Part 1 - change ImageContainerChild::SendFlush() to sync
this patch prevent deadlock
Attachment #699225 - Flags: review?(kchen)
Attachment #699888 - Flags: review?(kchen)
I'll try to review this tonight.
No longer blocks: 828289
Depends on: 828289
 
For [2], I created Bug 828289.

> [2] abort on re-starting preview
> camera app stop preview correctly and try to start preview again. 
> There is a Image(GonkIOSurfaceImage) from last preview. And the Image
> references to a CameraGraphicBuffer.
> When prevew starts, ImageContainer::SetCurrentImage() is called.
> And within it, ImageContainerChild::SendImageAsync() is called and
> mActiveImage is replaced to new Image.
> When mActiveImage is replaced, there is no refence to the GonkIOSurfaceImage
> from previous preview and the the GonkIOSurfaceImage is deleted.
> During the destruction of the GonkIOSurfaceImage,
> CameraGraphicBuffer::Unlock() is called and Gralloc buffer is deleted.
> After that on parent side, ImageContainerParent::RecvPublishImage() is
> called and send back previous image. But it is already deallocated on child
> side.
We'll definitely take a fix for these 3 related bugs so please don't stop working on them, but we won't hold the release for a camera app crasher that the user can easily work around.
blocking-basecamp: + → -
tracking-b2g18: --- → +
Comment on attachment 699888 [details] [diff] [review]
Part 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 699888 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by review!

I'm concerned about the thread-safety of 'sCameraGraphicBufferThreadRefCount'.  It looks like Start*() is called from the CameraGraphicBuffer (CGB) ctor, which is called from GonkNativeWindow::getCurrentBuffer() running on the camera driver's frame thread; but Release*() is called from sCameraGraphicBufferThread.  In this case, you should be using PR_ATOMIC_INCREMENT/_DECREMENT[1] on your reference counter.

Alternatively, you could add an nsRefPtr<nsIThread> to each CameraGraphicBuffer, and let the existing framework handle reference counting for you.  I just looked at [2] and nsThread uses NS_IMPL_THREADSAFE*().  I think this would simplify your code considerably.

1. See http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/pratom.h#66
2. http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#150

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +48,5 @@
> +    sCameraGraphicBufferThreadRefCount = 1;
> +
> +    nsresult rv = NS_NewNamedThread("CamGraphicBuf",
> +                                    getter_AddRefs(sCameraGraphicBufferThread));
> +    NS_ENSURE_SUCCESS(rv, );

Just use NS_ENSURE_SUCCESS_VOID(rv).

@@ +54,5 @@
> +
> +
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {

Put a check in here to make sure sCameraGraphicBufferThreadRefCount >= 0.

e.g.
if (NS_UNLIKELY(sCameraGraphicBufferThreadRefCount < 0)) {
  NS_ERROR("sCameraGraphicBufferThreadRefCount is < 0");
  MOZ_CRASH();
}

@@ +55,5 @@
> +
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {
> +    nsresult rv = sCameraGraphicBufferThread->Shutdown();

Based on your current implementation of StartCameraGraphicBufferThread(), --sCameraGraphicBufferThreadRefCount == 0 is not sufficient to guarantee that sCameraGraphicBufferThread != nullptr.  (i.e. if rv != NS_OK in the ctor.)

@@ +56,5 @@
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {
> +    nsresult rv = sCameraGraphicBufferThread->Shutdown();
> +    NS_ENSURE_SUCCESS(rv, );

NS_ENSURE_SUCCESS_VOID(rv)

@@ +649,5 @@
> +void CameraGraphicBuffer::Unlock() MOZ_OVERRIDE
> +{
> +    nsCOMPtr<nsIRunnable> returnBufferTask = new ReturnBufferTask(this);
> +    nsresult rv = sCameraGraphicBufferThread->Dispatch(returnBufferTask, NS_DISPATCH_NORMAL);
> +    NS_ENSURE_SUCCESS(rv, );

NS_ENSURE_SUCCESS_VOID(rv)

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +240,5 @@
>      typedef mozilla::layers::SurfaceDescriptor SurfaceDescriptor;
>      typedef mozilla::layers::ImageBridgeChild ImageBridgeChild;
>  
>  public:
> +public:

Remove duplicate 'public:' clause.

@@ +269,5 @@
> +        : mCameraGraphicBuffer(aCameraGraphicBuffer)
> +    {
> +    }
> +
> +    virtual ~ReturnBufferTask()

Make this protected.
Comment on attachment 699888 [details] [diff] [review]
Part 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 699888 [details] [diff] [review]:
-----------------------------------------------------------------

One more question..

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +634,5 @@
> +  , mGeneration(aGeneration)
> +  , mLocked(true)
> +{
> +    DOM_CAMERA_LOGT("%s:%d : this=%p\n", __func__, __LINE__, this);
> +    StartCameraGraphicBufferThread();

Since we are creating and releasing the camera graphic buffer thread on the creation and destruction of camera graphic buffer, could we just hold a reference to the thread here and let nsCOMPtr to handle the reference count?

Like

ctor:
  mThread = GetCameraGraphicBufferThread();

dtor:
  mThread = nullptr;

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +240,5 @@
>      typedef mozilla::layers::SurfaceDescriptor SurfaceDescriptor;
>      typedef mozilla::layers::ImageBridgeChild ImageBridgeChild;
>  
>  public:
> +public:

Remove this.

@@ +271,5 @@
> +    }
> +
> +    virtual ~ReturnBufferTask()
> +    {
> +    }

Remove this.
Attachment #699225 - Flags: review?(kchen) → review+
> @@ +271,5 @@
> > +    }
> > +
> > +    virtual ~ReturnBufferTask()
> > +    {
> > +    }
> 
> Remove this.

What mikeh said.
change
- remove thread ref counter
- GonkNativeWindow creates and hold a thread
Attachment #699888 - Attachment is obsolete: true
Attachment #699888 - Flags: review?(kchen)
change
- remove thread ref counter
- GonkNativeWindow creates and hold a thread
Attachment #700429 - Attachment is obsolete: true
Attachment #700434 - Flags: review?(mhabicher)
Comment on attachment 700434 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700434 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by!

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +606,5 @@
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()

nsCOMPtr<nsIThread>

@@ +608,5 @@
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> +{
> +    return mThread;

I think we could create the thread here lazily.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +82,5 @@
>      // Release all internal buffers
>      void abandon();
>  
>      SurfaceDescriptor *getSurfaceDescriptorFromBuffer(ANativeWindowBuffer* buffer);
> +    nsIThread* getGonkNativeWindowThread();

nsCOMPtr<nsIThread>

@@ +284,5 @@
> +    virtual ~ReturnBufferTask()
> +    {
> +    }
> +
> +protected:

Remove this one.
Comment on attachment 700434 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700434 [details] [diff] [review]:
-----------------------------------------------------------------

Good work.  This looks much simpler.  Please fix up those issues and this will probably be ready to land after the next review.

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +603,5 @@
> +void GonkNativeWindow::startGonkNativeWindowThread()
> +{
> +    nsresult rv = NS_NewNamedThread("CamGraphicBuf",
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);

Since there's not much we can do if this fails, it's probably better to assert here.  Something like:

if (NS_UNLIKELY(NS_FAILED(rv))) {
  NS_ERROR("Failed to start camera graphic buffer disposal thread");
  MOZ_CRASH();
}

@@ +606,5 @@
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()

already_AddRefed<...> ... ?

@@ +608,5 @@
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> +{
> +    return mThread;

I think it's better to create it when we init the GonkNativeWindow, so that we don't have to keep checking for a thread on every new CGB.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +82,5 @@
>      // Release all internal buffers
>      void abandon();
>  
>      SurfaceDescriptor *getSurfaceDescriptorFromBuffer(ANativeWindowBuffer* buffer);
> +    nsIThread* getGonkNativeWindowThread();

kanru: shouldn't this be
already_AddRefed<nsIThread> getGonkNativeWindowThread(); ?
Whiteboard: [b2g-crash] → [b2g-crash] [cr 437569]
(In reply to Mike Habicher [:mikeh] from comment #29)
> > +    nsIThread* getGonkNativeWindowThread();
> 
> kanru: shouldn't this be
> already_AddRefed<nsIThread> getGonkNativeWindowThread(); ?

We are not giving up the ownership so it's nsCOMPtr<nsIThread>
(In reply to Mike Habicher [:mikeh] from comment #29)
> Comment on attachment 700434 [details] [diff] [review]
> Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's
> thread
> 
> Review of attachment 700434 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> > +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> 
> already_AddRefed<...> ... ?

Actually, never mind--we're both wrong.  You want the returned object to get a new reference when it's stuffed into an nsCOMPtr<>, so in that case, a raw nsIThread* is correct.

See: https://developer.mozilla.org/en-US/docs/Using_nsCOMPtr/Getting_Started_Guide#Using_nsCOMPtr

(Specfically, look under "nsCOMPtr<T> f() don't return an nsCOMPtr".)
(In reply to Kan-Ru Chen [:kanru] from comment #28)
> Comment on attachment 700434 [details] [diff] [review]
> Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's
> thread
> 
> Review of attachment 700434 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +608,5 @@
> > +}
> > +
> > +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> > +{
> > +    return mThread;
> 
> I think we could create the thread here lazily.

I am going to keep startGonkNativeWindowThread(), because Sometimes, creating a thread needs a longer time. It is not good for realtime performance.
There's actually no need for getGonkNativeWindowThread() -- you're already in the class, just use mThread directly!
(In reply to Mike Habicher [:mikeh] from comment #33)
> There's actually no need for getGonkNativeWindowThread() -- you're already
> in the class, just use mThread directly!

As I talked to you, getGonkNativeWindowThread() is necessary to pass a thread pointer from GonkNativeWindow to CameraGraphicBuffer.
reflect comment
 - assert when failed to create a thread
 - remove redundant "protected:" in GonkNativeWindow.h
Attachment #700434 - Attachment is obsolete: true
Attachment #700434 - Flags: review?(mhabicher)
Comment on attachment 700489 [details] [diff] [review]
Part 2 rev 3 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700489 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  r+ with that one nit fixed.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +279,5 @@
> +    {
> +        mCameraGraphicBuffer->UnlockNow();
> +        return NS_OK;
> +    }
> +protected:

Nit: please insert a blank line before the 'protected' clause.
Attachment #700489 - Flags: review+
Attachment #700489 - Flags: review+ → review?(mhabicher)
Attachment #698059 - Attachment is obsolete: true
Attachment #699083 - Attachment is obsolete: true
Attachment #699225 - Attachment is obsolete: true
Attachment #699293 - Attachment is obsolete: true
Attachment #700489 - Attachment is obsolete: true
Attachment #700489 - Flags: review?(mhabicher)
mikeh, I start to re-think about Part2. attachment 700516 [details] [diff] [review] [diff] [review] affect very badly to H.264 video rendering. OMXCodec becomes to abort more often than before. I am going to think about to separate thread for ImageBridgeChild and ImageContainerChild.
 
When OMXCodec::freeBuffersOnPort() is called, all gralloc buffers needed to be within OMXCodec or GonkNativeWindow. attachment 700516 [details] [diff] [review] [diff] [review] is going to keep gralloc buffer longer time and is going to fail a buffer status check in OMXCodec::freeBuffersOnPort().
Attachment #700516 - Attachment is obsolete: true
sotaro, could the OMXCodec instabilities be due to the sync flush change?
Attachment #700514 - Attachment is obsolete: true
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

Yes, I confirmed that.
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

I checked about the problem. It hits another defect just because of function calls timing change.

- In ImageContainer::SetCurrentImage(), it the argument(aImage) is null pointer, ImageContainerChild::DispatchSetIdle() is called.
- ImageContainerChild::DispatchSetIdle() dispatched ImageContainerChild::SetIdleNow() to ImageBridge thread. It is asynchronous.
- In ImageContainerChild::SetIdleNow(), ImageContainerChild::SendFlush().
  SendFlush() is asynchronous compared to SetCurrentImage(). 

When the problem happens, SendFlush() is called after OMXCodec::stop().
After the SendFlush(), a buffer is returned to OMXCodec. But OMXCodec is already in shutdown state.
(In reply to Sotaro Ikeda [:sotaro] from comment #42)
> (In reply to Mike Habicher [:mikeh] from comment #40)
> > sotaro, could the OMXCodec instabilities be due to the sync flush change?
> 
> I checked about the problem. It hits another defect just because of function
> calls timing change.

To solve the problem, it is necessary to change ImageContainerChild::DispatchSetIdle() to synchronous,
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

It seems that this problems is Bug 823324.
Attachment #700514 - Attachment is obsolete: false
fix OMXCodec instabilities be due to the sync flush change
After apply in attachment 700950 [details] [diff] [review]. I could not re-produce the abort and deadlock.
There is still a risk of deadlock but it becomes rarely happens, I assume because shutdown happens sequentially as expected.
Attachment #700950 - Flags: review?(kchen)
(In reply to Sotaro Ikeda [:sotaro] from comment #46)
>
> There is still a risk of deadlock but it becomes rarely happens, I assume
> because shutdown happens sequentially as expected.

What's involved in eliminating this risk?
Attachment #700950 - Flags: review?(kchen) → review+
Before applying the patch, ImageContainerChild::SetIdleNow() and GonkNativeWindow::abandon() could happen at the same time. Because SetIdleNow() get called asynchronously. And deadlock happened when both functions called at the same time.

After applying the patch, SetIdleNow() called synchronously. Then both functions do not called at the same time.
landable patch
carry kchen: review+ from previous patch
Attachment #700950 - Attachment is obsolete: true
Comment on attachment 700993 [details] [diff] [review]
Part2 rev2 - change DispatchSetIdle() to synchronous SetIdle()

Review of attachment 700993 [details] [diff] [review]:
-----------------------------------------------------------------

Note that this patch doesn't apply cleanly on m-c.

::: gfx/layers/ipc/ImageContainerChild.h
@@ +104,5 @@
>     */
>    void DispatchDestroy();
>  
>    /**
> +   * flush and deallocate the shared images in the pool.

Fix the comment.
kanru, are there something I need to do more?
Whiteboard: [b2g-crash] [cr 437569] → [b2g-crash] [cr 437569], QARegressExclude
Keywords: checkin-needed
nominate to "blocking-b2g: tef"
Camera crashes lots without this fix.
blocking-b2g: --- → tef?
Will this fix the issue in bug 827833 as well?
(In reply to Mike Habicher [:mikeh] from comment #56)
> Will this fix the issue in bug 827833 as well?

Yes, I think.
blocking-b2g: tef? → tef+
https://hg.mozilla.org/mozilla-central/rev/9bc194aa7e14
https://hg.mozilla.org/mozilla-central/rev/4966fa10cf1d
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Can we uplift this to b2g18?
Yes, it is necessary to uplift patches to b2g18.
But I do not have permission to do it. Does someone could uplift them?
(In reply to Sotaro Ikeda [:sotaro] from comment #60)
> Yes, it is necessary to uplift patches to b2g18.
> But I do not have permission to do it. Does someone could uplift them?

I've heard that if the bug has "blocking-b2g:tef+", it is automatically uplifted.
(In reply to Sotaro Ikeda [:sotaro] from comment #61)
> I've heard that if the bug has "blocking-b2g:tef+", it is automatically
> uplifted.

Yes, the checkin fairies take care of it ;)
No longer depends on: 828289
Duplicate of this bug: 828289
Landed on mozilla-b2g18/gaia master prior to the 1/25 branching to mozilla-b2g18_v1_0_0/v1.0.0, updating status-b2g-v1.0.0 to fixed.
verified with 1.1 build of 20130322070202
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.