Crash when sending camera app to background: mozalloc_abort | pthread_mutex_unlock | ?? (mozalloc_abort() -- or -- abort() under libutils.so!android::RefBase::weakref_type::decWeak)

VERIFIED FIXED in Firefox 21

Status

()

Core
DOM: Device Interfaces
--
critical
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: cjones, Assigned: sotaro)

Tracking

({crash})

unspecified
mozilla21
ARM
Gonk (Firefox OS)
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, blocking-basecamp:-, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18+ verified, b2g18-v1.0.0 fixed)

Details

(Whiteboard: [b2g-crash] [cr 437569], QARegressExclude, crash signature)

Attachments

(2 attachments, 10 obsolete attachments)

1001 bytes, patch
Details | Diff | Splinter Review
3.84 KB, patch
Details | Diff | Splinter Review
Created attachment 698059 [details]
Minidump + dumplookup

+++ This bug was initially created as a clone of Bug #824224 +++

Seen during MW1 [1].

What happened was, after step 8, when pressing HOME to send the camera to the background, it crashed.

[1] https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW1:_Music_stays_alive
dumplookup is nice and juicy and useful.  Trimming what appears to be noise, I see

0x43761b14: libutils.so!android::RefBase::weakref_type::decWeak [RefBase.cpp : 412 + 0x1]
0x43761b24: libutils.so!android::RefBase::decStrong [RefBase.cpp : 363 + 0x1]
0x43761b44: libmemalloc.so!gralloc::PmemKernelAlloc::map_buffer [pmemalloc.cpp : 370 + 0x1]
0x43761b54: libutils.so!android::RefBase::weakref_type::decWeak [RefBase.cpp : 412 + 0x1]
0x43761bb4: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 1125 + 0x9]
0x43761bcc: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 1033 + 0x5]
0x43761be4: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 640 + 0xd]
0x43761c0c: libxul.so!mozilla::layers::SurfaceDescriptor::~SurfaceDescriptor [LayersSurfaces.cpp : 487 + 0x1]
0x43761c14: libxul.so!mozilla::layers::PImageContainerChild::Read [PImageContainerChild.cpp : 833 + 0xd]
0x43761c64: libxul.so!mozilla::layers::PImageContainerChild::FatalError [PImageContainerChild.cpp : 528 + 0x1]
0x43761c74: libxul.so!mozilla::layers::PImageContainerChild::OnMessageReceived [PImageContainerChild.cpp : 383 + 0x9]
0x43761cdc: libxul.so!mozilla::layers::PCompositorChild::OnMessageReceived [PCompositorChild.cpp : 637 + 0x9]
0x43761cec: libxul.so!mozilla::layers::PCompositorChild::OnMessageReceived [PCompositorChild.cpp : 630 + 0x1]
0x43761cfc: libxul.so!mozilla::ipc::AsyncChannel::OnDispatchMessage [AsyncChannel.cpp : 473 + 0xb]

This is in code that's trying to look up a PGrallocBufferChild.  I didn't have the patches from bug 824224 applied and I didn't look at the logcat, so I wasn't able to see the abort message.

If the decWeak() can be believed, then this could be a crash at

    const int32_t c = android_atomic_dec(&impl->mWeak);
    LOG_ASSERT(c >= 1, "decWeak called on %p too many times", this);

Alternatively, we could be crashing trying to look up a gralloc buffer actor.
blocking-basecamp: --- → ?
Component: DOM: IndexedDB → DOM: Device Interfaces

Updated

5 years ago
Crash Signature: [@ mozalloc_abort | pthread_mutex_unlock]
Keywords: crash
Whiteboard: [b2g-crash]
blocking-basecamp: ? → +
Summary: Crash: mozalloc_abort | pthread_mutex_unlock | ?? (abort() under libutils.so!android::RefBase::weakref_type::decWeak) → Crash when sending camera app to background: mozalloc_abort | pthread_mutex_unlock | ?? (mozalloc_abort() -- or -- abort() under libutils.so!android::RefBase::weakref_type::decWeak)
(Assignee)

Updated

5 years ago
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 2

5 years ago
I could easily reproduce the Camera app crash by repating following [1][2] on Unagi phone.

1. start Camera application. Wait until a camera preview becomes active. 
2. push home button

It happens when I repeat [1][2] about 4-5 times. It is a same crash, I think.
(Assignee)

Comment 3

5 years ago
When camera app crashed, I always see following log.

I/Gecko   (  482): [Child 482] ###!!! ABORT: [PImageContainerChild] abort()ing as a result: file /home/sotaro/build-b2g/B2G/objdir-gecko/ipc/ipdl/PImageContainerChild.cpp, line 527
F/libc    (  482): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
(Assignee)

Comment 4

5 years ago
I trid to use GDB to debug this. But it takes very long time to connect GDB in Camera app's case and failed to debug soon. Therefore, I debug this by adding a lot of log-prints.

From logouts, it seems that the abort happens like following sequence.
-[1] ImageContainerChild calls ImageContainerChild::SendPublishImage()
-[2] camera app stops preview
-[3] GonkNativeWindow free all Gralloc buffers
-[4] GrallocBufferActors are freed by ImageBridgeChild
-[5] ImageContainerParent receive ImageContainerParent::RecvPublishImage()
-[6] ImageContainerParent sends back a SharedImage that refers alredy deleted GrallocBufferActor
-[7] ImageContainerChild received the message from parent, but failed to get SurfaceDescriptorGralloc because it is alredy deleted
(Assignee)

Comment 5

5 years ago
Following causes to abort

> -[6] ImageContainerParent sends back a SharedImage that refers alredy deleted GrallocBufferActor
(Assignee)

Comment 6

5 years ago
There is a function"ImageContainerChild::SendFlush()" to remove SharedImage of parent side. But ImageContainerParent::RecvFlush() was receved after child side aborted when abort happened.
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> I could easily reproduce the Camera app crash by repating following [1][2]
> on Unagi phone.
> 
> 1. start Camera application. Wait until a camera preview becomes active. 
> 2. push home button
> 
> It happens when I repeat [1][2] about 4-5 times. It is a same crash, I think.

Yep, that's exactly this bug.
(Assignee)

Comment 8

5 years ago
I got following information from Kan-Ru Chen
 - Rendering and pending buffers are kept by GonkNativeWindow

So, the problem should be prevented by it... I need to investigate more.
While recording videos and switching between Camera and Gallery I see the PImageContainerChild abort mentioned here but also:

[Child 2325] ###!!! ABORT: [PBlobStreamChild] abort()ing as a result: file /home/tim/workspace/b2g-desktop/objdir-gecko-unagi/ipc/ipdl/PBlobStreamChild.cpp, line 364

Is this related or should I file a new bug? I do have a backtrace which also contains the first frame from above (mozilla::ipc::AsyncChannel::OnDispatchMessage).
(Assignee)

Comment 10

5 years ago
(In reply to Tim Taubert [:ttaubert] from comment #9)
> While recording videos and switching between Camera and Gallery I see the
> PImageContainerChild abort mentioned here but also:
> 
> [Child 2325] ###!!! ABORT: [PBlobStreamChild] abort()ing as a result: file
> /home/tim/workspace/b2g-desktop/objdir-gecko-unagi/ipc/ipdl/PBlobStreamChild.
> cpp, line 364
> 
> Is this related or should I file a new bug? I do have a backtrace which also
> contains the first frame from above
> (mozilla::ipc::AsyncChannel::OnDispatchMessage).

It seems like that it is a different bug. Can you file a new bug?
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> It seems like that it is a different bug. Can you file a new bug?

Yeah, the Gallery just crashed again in PBlobStreamChild.cpp while in the background right after I started recording. Doesn't seem related to me, will file a new bug.
(Assignee)

Comment 12

5 years ago
Created attachment 699083 [details]
logout when abort gets called (manually added logouts)

I added more printout to confirm following.

> I got following information from Kan-Ru Chen
> - Rendering and pending buffers are kept by GonkNativeWindow

"BufferSlot::RENDERING" just prevents to delete a buffer in GonkNativeWindow::freeBufferLocked(int i).
The buffer is deleted after when CameraGraphicBuffer loses reference from others.

this log shows that.
(Assignee)

Comment 13

5 years ago
Created attachment 699225 [details] [diff] [review]
Part 1 - change ImageContainerChild::SendFlush() to sync

one cause of abort was because of async ImageContainerChild::SendFlush(). This patch change ImageContainerChild::SendFlush() to synchronous.

Updated

5 years ago
Blocks: 827833
(Assignee)

Comment 14

5 years ago
By applying attachment 699225 [details] [diff] [review] , I face following problem.
- Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state

It seems that attachment 699225 [details] [diff] [review] hit another defect.
(Assignee)

Comment 15

5 years ago
Created attachment 699293 [details]
Deadlock log after applying attachment 699225 [details] [diff] [review]

get log by following command
> adb shell logcat -v threadtime > log.txt

dead lock happens like following sequence.
- [1] camera application stops preview
- [2] GonkNativeWindow frees all gralloc buffers by calling GonkNativeWindow::abandon() //camera thread
- [3] GonkNativeWindow require mutex lock. //camera thread
- [4] GonkNativeWindow::freeAllBuffersLocked() tries to free all gralloc buffers except rendering state buffers.  //camera thread
- [5] ImageBridgeChild::DeallocSurfaceDescriptorGralloc() is called to deallocate gralloc buffer.
- [6] DeallocSurfaceDescriptorGralloc() post a task of deallocate gralloc buffer to ImageBridge's thread
            and wait until the buffer is deleted
- [7] ImageBridge's thread processes some tasks on the thread.
          one of task is ImageContainerChild::SetIdleNow()
- [8] ImageContainerChild::SetIdleNow() delete gralloc buffer synchronously by change of attachment 699225 [details] [diff] [review]
       and frees some GonkIOSurfaceImage   // ImageBridge's thread
- [9] In GonkIOSurfaceImage::~GonkIOSurfaceImage(), CameraGraphicBuffer::Unlock() is called    // ImageBridge's thread
- [10] In CameraGraphicBuffer::Unlock(), GonkNativeWindow::returnBuffer() is called    // ImageBridge's thread
- [11] GonkNativeWindow::returnBuffer() tries to hold the mutex lock. But it is alrealy hold by [3] in different thread !!!!!!!!
(Assignee)

Comment 16

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> By applying attachment 699225 [details] [diff] [review] , I face following
> problem.
> - Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state
> 
> It seems that attachment 699225 [details] [diff] [review] hit another defect.

The deadlock seems similar to Bug 826765
(Assignee)

Comment 17

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> By applying attachment 699225 [details] [diff] [review] , I face following
> problem.
> - Sometimes, GonkNativeWindow::freeBufferLocked() falls into deadlock state
> 
> It seems that attachment 699225 [details] [diff] [review] hit another defect.

I locally confirmed that the deadlock could be prevented by following change.

- dispatch GonkNativeWindow::returnBuffer() to a different thread than ImageBridge's thread

Right now, I use main thread for it. It is better to assign dedicated thread for it, because performance is very important.
(Assignee)

Comment 18

5 years ago
I found 2 patterns to the aborts. Attachment 699225 [details] [diff] tried to address [1]. It is better to address [2] in different bug. 

[1] abort during stopping preview
After calling ImageContainerChild::SendPublishImage(), ImageContainerChild calls ImageContainerChild::SendFlush() and frees child side Graloc buffers.
Then on parent side, ImageContainerParent::RecvPublishImage() is called and send back previous image. But it is already deallocated on child side.

[2] abort on re-starting preview
camera app stop preview correctly and try to start preview again. 
There is a Image(GonkIOSurfaceImage) from last preview. And the Image references to a CameraGraphicBuffer.
When prevew starts, ImageContainer::SetCurrentImage() is called.
And within it, ImageContainerChild::SendImageAsync() is called and mActiveImage is replaced to new Image.
When mActiveImage is replaced, there is no refence to the GonkIOSurfaceImage from previous preview and the the GonkIOSurfaceImage is deleted.
During the destruction of the GonkIOSurfaceImage, CameraGraphicBuffer::Unlock() is called and Gralloc buffer is deleted.
After that on parent side, ImageContainerParent::RecvPublishImage() is called and send back previous image. But it is already deallocated on child side.
(Assignee)

Updated

5 years ago
Blocks: 828289
(Assignee)

Updated

5 years ago
Attachment #699225 - Attachment description: change ImageContainerChild::SendFlush() to sync → Part 1 - change ImageContainerChild::SendFlush() to sync
(Assignee)

Comment 19

5 years ago
Created attachment 699888 [details] [diff] [review]
Part 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

this patch prevent deadlock
(Assignee)

Updated

5 years ago
Attachment #699225 - Flags: review?(kchen)
(Assignee)

Updated

5 years ago
Attachment #699888 - Flags: review?(kchen)
I'll try to review this tonight.
No longer blocks: 828289
Depends on: 828289
(Assignee)

Comment 21

5 years ago
 
For [2], I created Bug 828289.

> [2] abort on re-starting preview
> camera app stop preview correctly and try to start preview again. 
> There is a Image(GonkIOSurfaceImage) from last preview. And the Image
> references to a CameraGraphicBuffer.
> When prevew starts, ImageContainer::SetCurrentImage() is called.
> And within it, ImageContainerChild::SendImageAsync() is called and
> mActiveImage is replaced to new Image.
> When mActiveImage is replaced, there is no refence to the GonkIOSurfaceImage
> from previous preview and the the GonkIOSurfaceImage is deleted.
> During the destruction of the GonkIOSurfaceImage,
> CameraGraphicBuffer::Unlock() is called and Gralloc buffer is deleted.
> After that on parent side, ImageContainerParent::RecvPublishImage() is
> called and send back previous image. But it is already deallocated on child
> side.
We'll definitely take a fix for these 3 related bugs so please don't stop working on them, but we won't hold the release for a camera app crasher that the user can easily work around.
blocking-basecamp: + → -
tracking-b2g18: --- → +
Comment on attachment 699888 [details] [diff] [review]
Part 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 699888 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by review!

I'm concerned about the thread-safety of 'sCameraGraphicBufferThreadRefCount'.  It looks like Start*() is called from the CameraGraphicBuffer (CGB) ctor, which is called from GonkNativeWindow::getCurrentBuffer() running on the camera driver's frame thread; but Release*() is called from sCameraGraphicBufferThread.  In this case, you should be using PR_ATOMIC_INCREMENT/_DECREMENT[1] on your reference counter.

Alternatively, you could add an nsRefPtr<nsIThread> to each CameraGraphicBuffer, and let the existing framework handle reference counting for you.  I just looked at [2] and nsThread uses NS_IMPL_THREADSAFE*().  I think this would simplify your code considerably.

1. See http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/pratom.h#66
2. http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#150

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +48,5 @@
> +    sCameraGraphicBufferThreadRefCount = 1;
> +
> +    nsresult rv = NS_NewNamedThread("CamGraphicBuf",
> +                                    getter_AddRefs(sCameraGraphicBufferThread));
> +    NS_ENSURE_SUCCESS(rv, );

Just use NS_ENSURE_SUCCESS_VOID(rv).

@@ +54,5 @@
> +
> +
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {

Put a check in here to make sure sCameraGraphicBufferThreadRefCount >= 0.

e.g.
if (NS_UNLIKELY(sCameraGraphicBufferThreadRefCount < 0)) {
  NS_ERROR("sCameraGraphicBufferThreadRefCount is < 0");
  MOZ_CRASH();
}

@@ +55,5 @@
> +
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {
> +    nsresult rv = sCameraGraphicBufferThread->Shutdown();

Based on your current implementation of StartCameraGraphicBufferThread(), --sCameraGraphicBufferThreadRefCount == 0 is not sufficient to guarantee that sCameraGraphicBufferThread != nullptr.  (i.e. if rv != NS_OK in the ctor.)

@@ +56,5 @@
> +static void ReleaseCameraGraphicBufferThread()
> +{
> +  if(--sCameraGraphicBufferThreadRefCount == 0) {
> +    nsresult rv = sCameraGraphicBufferThread->Shutdown();
> +    NS_ENSURE_SUCCESS(rv, );

NS_ENSURE_SUCCESS_VOID(rv)

@@ +649,5 @@
> +void CameraGraphicBuffer::Unlock() MOZ_OVERRIDE
> +{
> +    nsCOMPtr<nsIRunnable> returnBufferTask = new ReturnBufferTask(this);
> +    nsresult rv = sCameraGraphicBufferThread->Dispatch(returnBufferTask, NS_DISPATCH_NORMAL);
> +    NS_ENSURE_SUCCESS(rv, );

NS_ENSURE_SUCCESS_VOID(rv)

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +240,5 @@
>      typedef mozilla::layers::SurfaceDescriptor SurfaceDescriptor;
>      typedef mozilla::layers::ImageBridgeChild ImageBridgeChild;
>  
>  public:
> +public:

Remove duplicate 'public:' clause.

@@ +269,5 @@
> +        : mCameraGraphicBuffer(aCameraGraphicBuffer)
> +    {
> +    }
> +
> +    virtual ~ReturnBufferTask()

Make this protected.
Comment on attachment 699888 [details] [diff] [review]
Part 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 699888 [details] [diff] [review]:
-----------------------------------------------------------------

One more question..

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +634,5 @@
> +  , mGeneration(aGeneration)
> +  , mLocked(true)
> +{
> +    DOM_CAMERA_LOGT("%s:%d : this=%p\n", __func__, __LINE__, this);
> +    StartCameraGraphicBufferThread();

Since we are creating and releasing the camera graphic buffer thread on the creation and destruction of camera graphic buffer, could we just hold a reference to the thread here and let nsCOMPtr to handle the reference count?

Like

ctor:
  mThread = GetCameraGraphicBufferThread();

dtor:
  mThread = nullptr;

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +240,5 @@
>      typedef mozilla::layers::SurfaceDescriptor SurfaceDescriptor;
>      typedef mozilla::layers::ImageBridgeChild ImageBridgeChild;
>  
>  public:
> +public:

Remove this.

@@ +271,5 @@
> +    }
> +
> +    virtual ~ReturnBufferTask()
> +    {
> +    }

Remove this.
Attachment #699225 - Flags: review?(kchen) → review+
> @@ +271,5 @@
> > +    }
> > +
> > +    virtual ~ReturnBufferTask()
> > +    {
> > +    }
> 
> Remove this.

What mikeh said.
(Assignee)

Comment 26

5 years ago
Created attachment 700429 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

change
- remove thread ref counter
- GonkNativeWindow creates and hold a thread
Attachment #699888 - Attachment is obsolete: true
Attachment #699888 - Flags: review?(kchen)
(Assignee)

Comment 27

5 years ago
Created attachment 700434 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

change
- remove thread ref counter
- GonkNativeWindow creates and hold a thread
Attachment #700429 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #700434 - Flags: review?(mhabicher)
Comment on attachment 700434 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700434 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by!

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +606,5 @@
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()

nsCOMPtr<nsIThread>

@@ +608,5 @@
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> +{
> +    return mThread;

I think we could create the thread here lazily.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +82,5 @@
>      // Release all internal buffers
>      void abandon();
>  
>      SurfaceDescriptor *getSurfaceDescriptorFromBuffer(ANativeWindowBuffer* buffer);
> +    nsIThread* getGonkNativeWindowThread();

nsCOMPtr<nsIThread>

@@ +284,5 @@
> +    virtual ~ReturnBufferTask()
> +    {
> +    }
> +
> +protected:

Remove this one.
Comment on attachment 700434 [details] [diff] [review]
Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700434 [details] [diff] [review]:
-----------------------------------------------------------------

Good work.  This looks much simpler.  Please fix up those issues and this will probably be ready to land after the next review.

::: ./src_org/dom/camera/GonkNativeWindow.cpp
@@ +603,5 @@
> +void GonkNativeWindow::startGonkNativeWindowThread()
> +{
> +    nsresult rv = NS_NewNamedThread("CamGraphicBuf",
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);

Since there's not much we can do if this fails, it's probably better to assert here.  Something like:

if (NS_UNLIKELY(NS_FAILED(rv))) {
  NS_ERROR("Failed to start camera graphic buffer disposal thread");
  MOZ_CRASH();
}

@@ +606,5 @@
> +                                    getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS_VOID(rv);
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()

already_AddRefed<...> ... ?

@@ +608,5 @@
> +}
> +
> +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> +{
> +    return mThread;

I think it's better to create it when we init the GonkNativeWindow, so that we don't have to keep checking for a thread on every new CGB.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +82,5 @@
>      // Release all internal buffers
>      void abandon();
>  
>      SurfaceDescriptor *getSurfaceDescriptorFromBuffer(ANativeWindowBuffer* buffer);
> +    nsIThread* getGonkNativeWindowThread();

kanru: shouldn't this be
already_AddRefed<nsIThread> getGonkNativeWindowThread(); ?
Whiteboard: [b2g-crash] → [b2g-crash] [cr 437569]
(In reply to Mike Habicher [:mikeh] from comment #29)
> > +    nsIThread* getGonkNativeWindowThread();
> 
> kanru: shouldn't this be
> already_AddRefed<nsIThread> getGonkNativeWindowThread(); ?

We are not giving up the ownership so it's nsCOMPtr<nsIThread>
(In reply to Mike Habicher [:mikeh] from comment #29)
> Comment on attachment 700434 [details] [diff] [review]
> Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's
> thread
> 
> Review of attachment 700434 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> > +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> 
> already_AddRefed<...> ... ?

Actually, never mind--we're both wrong.  You want the returned object to get a new reference when it's stuffed into an nsCOMPtr<>, so in that case, a raw nsIThread* is correct.

See: https://developer.mozilla.org/en-US/docs/Using_nsCOMPtr/Getting_Started_Guide#Using_nsCOMPtr

(Specfically, look under "nsCOMPtr<T> f() don't return an nsCOMPtr".)
(Assignee)

Comment 32

5 years ago
(In reply to Kan-Ru Chen [:kanru] from comment #28)
> Comment on attachment 700434 [details] [diff] [review]
> Part 2 rev 2 - move GonkNativeWindow::returnBuffer() out of ImageBridge's
> thread
> 
> Review of attachment 700434 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +608,5 @@
> > +}
> > +
> > +nsIThread* GonkNativeWindow::getGonkNativeWindowThread()
> > +{
> > +    return mThread;
> 
> I think we could create the thread here lazily.

I am going to keep startGonkNativeWindowThread(), because Sometimes, creating a thread needs a longer time. It is not good for realtime performance.
There's actually no need for getGonkNativeWindowThread() -- you're already in the class, just use mThread directly!
(Assignee)

Comment 34

5 years ago
(In reply to Mike Habicher [:mikeh] from comment #33)
> There's actually no need for getGonkNativeWindowThread() -- you're already
> in the class, just use mThread directly!

As I talked to you, getGonkNativeWindowThread() is necessary to pass a thread pointer from GonkNativeWindow to CameraGraphicBuffer.
(Assignee)

Comment 35

5 years ago
Created attachment 700489 [details] [diff] [review]
Part 2 rev 3 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

reflect comment
 - assert when failed to create a thread
 - remove redundant "protected:" in GonkNativeWindow.h
Attachment #700434 - Attachment is obsolete: true
Attachment #700434 - Flags: review?(mhabicher)
Comment on attachment 700489 [details] [diff] [review]
Part 2 rev 3 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread

Review of attachment 700489 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  r+ with that one nit fixed.

::: ./src_org/dom/camera/GonkNativeWindow.h
@@ +279,5 @@
> +    {
> +        mCameraGraphicBuffer->UnlockNow();
> +        return NS_OK;
> +    }
> +protected:

Nit: please insert a blank line before the 'protected' clause.
Attachment #700489 - Flags: review+
(Assignee)

Updated

5 years ago
Attachment #700489 - Flags: review+ → review?(mhabicher)
(Assignee)

Comment 37

5 years ago
Created attachment 700514 [details] [diff] [review]
Part 1 rev 2 - change ImageContainerChild::SendFlush() to sync
Attachment #698059 - Attachment is obsolete: true
Attachment #699083 - Attachment is obsolete: true
Attachment #699225 - Attachment is obsolete: true
Attachment #699293 - Attachment is obsolete: true
(Assignee)

Comment 38

5 years ago
Created attachment 700516 [details] [diff] [review]
Part 2 rev 4 - move GonkNativeWindow::returnBuffer() out of ImageBridge's thread
Attachment #700489 - Attachment is obsolete: true
Attachment #700489 - Flags: review?(mhabicher)
(Assignee)

Comment 39

5 years ago
mikeh, I start to re-think about Part2. attachment 700516 [details] [diff] [review] [diff] [review] affect very badly to H.264 video rendering. OMXCodec becomes to abort more often than before. I am going to think about to separate thread for ImageBridgeChild and ImageContainerChild.
 
When OMXCodec::freeBuffersOnPort() is called, all gralloc buffers needed to be within OMXCodec or GonkNativeWindow. attachment 700516 [details] [diff] [review] [diff] [review] is going to keep gralloc buffer longer time and is going to fail a buffer status check in OMXCodec::freeBuffersOnPort().
(Assignee)

Updated

5 years ago
Attachment #700516 - Attachment is obsolete: true
sotaro, could the OMXCodec instabilities be due to the sync flush change?
(Assignee)

Updated

5 years ago
Attachment #700514 - Attachment is obsolete: true
(Assignee)

Comment 41

5 years ago
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

Yes, I confirmed that.
(Assignee)

Comment 42

5 years ago
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

I checked about the problem. It hits another defect just because of function calls timing change.

- In ImageContainer::SetCurrentImage(), it the argument(aImage) is null pointer, ImageContainerChild::DispatchSetIdle() is called.
- ImageContainerChild::DispatchSetIdle() dispatched ImageContainerChild::SetIdleNow() to ImageBridge thread. It is asynchronous.
- In ImageContainerChild::SetIdleNow(), ImageContainerChild::SendFlush().
  SendFlush() is asynchronous compared to SetCurrentImage(). 

When the problem happens, SendFlush() is called after OMXCodec::stop().
After the SendFlush(), a buffer is returned to OMXCodec. But OMXCodec is already in shutdown state.
(Assignee)

Comment 43

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #42)
> (In reply to Mike Habicher [:mikeh] from comment #40)
> > sotaro, could the OMXCodec instabilities be due to the sync flush change?
> 
> I checked about the problem. It hits another defect just because of function
> calls timing change.

To solve the problem, it is necessary to change ImageContainerChild::DispatchSetIdle() to synchronous,
(Assignee)

Comment 44

5 years ago
(In reply to Mike Habicher [:mikeh] from comment #40)
> sotaro, could the OMXCodec instabilities be due to the sync flush change?

It seems that this problems is Bug 823324.
(Assignee)

Updated

5 years ago
Attachment #700514 - Attachment is obsolete: false
(Assignee)

Comment 45

5 years ago
Created attachment 700950 [details] [diff] [review]
Part2 - change DispatchSetIdle() to synchronous SetIdle()

fix OMXCodec instabilities be due to the sync flush change
(Assignee)

Comment 46

5 years ago
After apply in attachment 700950 [details] [diff] [review]. I could not re-produce the abort and deadlock.
There is still a risk of deadlock but it becomes rarely happens, I assume because shutdown happens sequentially as expected.
(Assignee)

Updated

5 years ago
Attachment #700950 - Flags: review?(kchen)
(In reply to Sotaro Ikeda [:sotaro] from comment #46)
>
> There is still a risk of deadlock but it becomes rarely happens, I assume
> because shutdown happens sequentially as expected.

What's involved in eliminating this risk?
Attachment #700950 - Flags: review?(kchen) → review+
(Assignee)

Comment 48

5 years ago
Before applying the patch, ImageContainerChild::SetIdleNow() and GonkNativeWindow::abandon() could happen at the same time. Because SetIdleNow() get called asynchronously. And deadlock happened when both functions called at the same time.

After applying the patch, SetIdleNow() called synchronously. Then both functions do not called at the same time.
(Assignee)

Comment 49

5 years ago
Created attachment 700993 [details] [diff] [review]
Part2 rev2 - change DispatchSetIdle() to synchronous SetIdle()

landable patch
carry kchen: review+ from previous patch
Attachment #700950 - Attachment is obsolete: true
Comment on attachment 700993 [details] [diff] [review]
Part2 rev2 - change DispatchSetIdle() to synchronous SetIdle()

Review of attachment 700993 [details] [diff] [review]:
-----------------------------------------------------------------

Note that this patch doesn't apply cleanly on m-c.

::: gfx/layers/ipc/ImageContainerChild.h
@@ +104,5 @@
>     */
>    void DispatchDestroy();
>  
>    /**
> +   * flush and deallocate the shared images in the pool.

Fix the comment.
http://tbpl.mozilla.org/?tree=Try&rev=56d80d160f26
(Assignee)

Comment 52

5 years ago
kanru, are there something I need to do more?

Updated

5 years ago
Whiteboard: [b2g-crash] [cr 437569] → [b2g-crash] [cr 437569], QARegressExclude
(Assignee)

Updated

5 years ago
Keywords: checkin-needed
(Assignee)

Comment 53

5 years ago
All green in try.
https://tbpl.mozilla.org/?tree=Try&rev=56d80d160f26
https://hg.mozilla.org/integration/mozilla-inbound/rev/9bc194aa7e14
https://hg.mozilla.org/integration/mozilla-inbound/rev/4966fa10cf1d
Keywords: checkin-needed
(Assignee)

Comment 55

5 years ago
nominate to "blocking-b2g: tef"
Camera crashes lots without this fix.
blocking-b2g: --- → tef?
Will this fix the issue in bug 827833 as well?
(Assignee)

Comment 57

5 years ago
(In reply to Mike Habicher [:mikeh] from comment #56)
> Will this fix the issue in bug 827833 as well?

Yes, I think.
blocking-b2g: tef? → tef+
https://hg.mozilla.org/mozilla-central/rev/9bc194aa7e14
https://hg.mozilla.org/mozilla-central/rev/4966fa10cf1d
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Can we uplift this to b2g18?
(Assignee)

Comment 60

5 years ago
Yes, it is necessary to uplift patches to b2g18.
But I do not have permission to do it. Does someone could uplift them?
(Assignee)

Comment 61

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #60)
> Yes, it is necessary to uplift patches to b2g18.
> But I do not have permission to do it. Does someone could uplift them?

I've heard that if the bug has "blocking-b2g:tef+", it is automatically uplifted.
https://hg.mozilla.org/releases/mozilla-b2g18/rev/10072dfd79c6
https://hg.mozilla.org/releases/mozilla-b2g18/rev/9e8abf446e3d
status-b2g18: --- → fixed
status-firefox19: --- → wontfix
status-firefox20: --- → wontfix
status-firefox21: --- → fixed
(In reply to Sotaro Ikeda [:sotaro] from comment #61)
> I've heard that if the bug has "blocking-b2g:tef+", it is automatically
> uplifted.

Yes, the checkin fairies take care of it ;)
No longer depends on: 828289
Blocks: 832119
(Assignee)

Updated

5 years ago
Duplicate of this bug: 828289
Landed on mozilla-b2g18/gaia master prior to the 1/25 branching to mozilla-b2g18_v1_0_0/v1.0.0, updating status-b2g-v1.0.0 to fixed.
status-b2g18-v1.0.0: --- → fixed
verified with 1.1 build of 20130322070202
Status: RESOLVED → VERIFIED
status-b2g18: fixed → verified
You need to log in before you can comment on or make changes to this bug.