Crash in mozilla::MediaOmxReader::NotifyDataArrived (ProcessCachedData runnable is running but reader has been destroyed)

VERIFIED FIXED in mozilla35

Status

()

defect
--
critical
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: rkunkel, Assigned: rlin)

Tracking

({crash, regression, reproducible})

unspecified
mozilla35
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.2+, b2g-v2.0 unaffected, b2g-v2.1 unaffected, b2g-v2.2 verified)

Details

(crash signature)

Attachments

(2 attachments, 8 obsolete attachments)

Reporter

Description

5 years ago
This bug was filed from the Socorro interface and is 
report bp-3b60537e-d6b6-4942-9e9c-d254c2140918.
=============================================================

I encountered this crash on the latest 2.2 Flame KK build:

Enviromental Variables:
----------------------------------------
Device: Flame 2.2 Master
BuildID: 20140917223000
Gaia: d37950eb09e28aa18d0e01df9ff90574bd4337e0
Gecko: 426497473505
Version: 35.0a1 (2.2 Master)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

Repro Steps:

1) Update device to latest buildID: 20140917223000
2) Have more than 1 song on the SD card
3) Open music app and tap the fastforward and rewind buttons

Repro Rate: 100% - 5/5 attempts
Reporter

Updated

5 years ago
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Reporter

Updated

5 years ago
Whiteboard: [2.1-Daily-Testing]
QAWanted for branch checks.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: qawanted
Whiteboard: [2.1-Daily-Testing]
Assuming tap means "tap a lot" I can reproduce this on my Nexus 5.
Take it.
Assignee: nobody → rlin
Flags: needinfo?(rlin)
Should stop ProcessingCachedData process if decoder/reader enter shutdown stage.
Posted patch patch v1 (obsolete) — Splinter Review
This patch stops the ProcessCachedData process when MediaOmxReader enter shutdown stage.

I try to use mDecoder->IsShutdown() in OmxReaderNotifyDataArrivedRunnable::NotifyDataArrived() 
But it only reduce the crash rate.
So I create mIsShutdown variable to check if this reader has entered shutdown stage.
Attachment #8492073 - Flags: review?(edwin)
Attachment #8492073 - Flags: feedback?(jwwang)
(In reply to Randy Lin [:rlin] from comment #6)
> I try to use mDecoder->IsShutdown() in
> OmxReaderNotifyDataArrivedRunnable::NotifyDataArrived() 
> But it only reduce the crash rate.
> So I create mIsShutdown variable to check if this reader has entered
> shutdown stage.

This is because MediaOmxReader holds a raw pointer to |mDecoder| which becoming a dangling pointer.
Comment on attachment 8492073 [details] [diff] [review]
patch v1

Review of attachment 8492073 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +100,5 @@
>    {
>      const char* buffer = mBuffer.get();
>  
>      while (mLength) {
> +      if (mOmxReader->IsShutdown()) {

It won't work this way. Polling incurs racing. MediaOmxReader::Shutdown() could happen in between |mOmxReader->IsShutdown()| and |mOmxReader->NotifyDataArrived()|.

Also, since OmxReaderProcessCachedDataTask and OmxReaderNotifyDataArrivedRunnable hold strong reference to MediaOmxReader, the life cycle of MediaOmxReader is extended and mDecoder could become invalid during the extension. That will also result in memory corruptions.
Attachment #8492073 - Flags: feedback?(jwwang) → feedback-
Attachment #8492073 - Flags: review?(edwin)
This bug repro's on Flame KK builds: Flame 2.2, OpenC 2.2

Actual Results: Able to crash the music app by playing a song then hitting forward and rewind a couple times.

Repro Rate: 6/6

Environmental Variables:
Device: Flame Master KK
BuildID: 20140922040649
Gaia: 3802009e1ab6c3ddfc3eb15522e3140a96b33336
Gecko: 5e704397529b
Version: 35.0a1 (Master) 
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0
-----------------------------------------------------------------
Environmental Variables:
Device: Open_C Master
BuildID: 20140922040649
Gaia: 3802009e1ab6c3ddfc3eb15522e3140a96b33336
Gecko: 5e704397529b
Version: 35.0a1 (Master) 
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

-----------------------------------------------------------------
-----------------------------------------------------------------

This bug does NOT repro on Flame kk build: Flame 2.1, Flame 2.0, Flame 2.0 KK Base

Actual Result: No crash occurs when using the forward and rewind buttons in the music app.

Repro Rate: 0/15

Environmental Variables:
Device: Flame 2.1 KK
BuildID: 20140922053548
Gaia: 689c4ad4d8c3e4aa95805a2e49ae6cf786a1ae91
Gecko: 185fc54d29c1
Version: 34.0a2
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
-----------------------------------------------------------------
Environmental Variables:
Device: Flame 2.0 KK
BuildID: 20140920152249
Gaia: 0658006be8a00fdb5931ee15a0aa353a3ab231ba
Gecko: c0086da55273
Version: 32.0 (2.0) 
Firmware Version: L1TC10011800
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
-----------------------------------------------------------------
Environmental Variables:
Device: Flame 2.0 KK Base
BuildID: 20140904160718
Gaia: 506da297098326c671523707caae6eaba7e718da
Gecko: 
Version: 32.0 (2.0)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
QA Contact: croesch
nomming to block - regression in a major feature, reproducible crash
blocking-b2g: --- → 2.2?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
QA Contact: croesch
Blocking Reason: Basic function regressed
blocking-b2g: 2.2? → 2.2+
Component: Gaia::Music → Video/Audio
Product: Firefox OS → Core
I meant basic function causing crash in my previous comment
Duplicate of this bug: 1071462
Posted patch WIP (obsolete) — Splinter Review
Hi JW, 
Could you take a look?
Attachment #8492073 - Attachment is obsolete: true
Attachment #8494220 - Flags: feedback?(jwwang)
Posted patch WIP (obsolete) — Splinter Review
this one is correct.
Attachment #8494220 - Attachment is obsolete: true
Attachment #8494220 - Flags: feedback?(jwwang)
Attachment #8494221 - Flags: feedback?(jwwang)
Comment on attachment 8494221 [details] [diff] [review]
WIP

Review of attachment 8494221 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +590,5 @@
>    nsresult rv = mDecoder->GetResource()->ReadFromCache(buffer.get(),
>                                                         aOffset, bufferLength);
>    NS_ENSURE_SUCCESS(rv, -1);
>  
> +  mOmxReaderNotifyDataArrivedRunnable =

Per discussion offline, this could happen after Shutdown() and won't get revoked.
Attachment #8494221 - Flags: feedback?(jwwang)
Posted patch bug.patch v3 (obsolete) — Splinter Review
Thanks JW, 
Here is the new one, avoid to create new runnable when enter shutdown stage.
Attachment #8494221 - Attachment is obsolete: true
Attachment #8494295 - Flags: feedback?(jwwang)
Comment on attachment 8494295 [details] [diff] [review]
bug.patch v3

Review of attachment 8494295 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +45,5 @@
>  
>    void Run()
>    {
>      MOZ_ASSERT(!NS_IsMainThread());
> +    if (mOmxReader) {

See below. We don't need this check for we won't create an OmxReaderProcessCachedDataTask object when mOmxReader is null.

@@ +125,5 @@
>        // We cannot read data in the main thread because it
>        // might block for too long. Instead we post an IO task
>        // to the IO thread if there is more data available.
>        XRE_GetIOMessageLoop()->PostTask(FROM_HERE,
> +          new OmxReaderProcessCachedDataTask(mOmxReader, mOffset));

Don't even bother posting the task when mOmxReader is null.

@@ +179,5 @@
>  }
>  
>  void MediaOmxReader::Shutdown()
>  {
> +  if (mOmxReaderNotifyDataArrivedRunnable.get()) {

Just say |if (mOmxReaderNotifyDataArrivedRunnable)|.

@@ +182,5 @@
>  {
> +  if (mOmxReaderNotifyDataArrivedRunnable.get()) {
> +    mOmxReaderNotifyDataArrivedRunnable->Revoke();
> +  }
> +  MutexAutoLock lock(mMutex);

Move this lock above, since mOmxReaderNotifyDataArrivedRunnable is accessed by different threads.

@@ +597,5 @@
>                                             buffer.forget(),
>                                             bufferLength,
>                                             aOffset,
> +                                           resourceLength);
> +  MutexAutoLock lock(mMutex);

Move this lock above, since mOmxReaderNotifyDataArrivedRunnable is accessed by different threads.

@@ +598,5 @@
>                                             bufferLength,
>                                             aOffset,
> +                                           resourceLength);
> +  MutexAutoLock lock(mMutex);
> +  if (mIsShutdown) {

We should check |mIsShutdown| at the top of this function and return immediately if it is true. Otherwise it is invalid to access |mDecoder| after shutdown.

::: content/media/omx/MediaOmxReader.h
@@ +26,5 @@
>    class TimeRanges;
>  }
>  
>  class AbstractMediaDecoder;
> +class OmxReaderNotifyDataArrivedRunnable;

You might want to have OmxReaderNotifyDataArrivedRunnable inherit nsICancelableRunnable to keep OmxReaderNotifyDataArrivedRunnable in .cpp only.

@@ +44,5 @@
>  protected:
>    android::sp<android::OmxDecoder> mOmxDecoder;
>    android::sp<android::MediaExtractor> mExtractor;
>    MP3FrameParser mMP3FrameParser;
> +  nsRefPtr<OmxReaderNotifyDataArrivedRunnable> mOmxReaderNotifyDataArrivedRunnable;

Then we can say nsRefPtr<nsICancelableRunnable> here.
Attachment #8494295 - Flags: feedback?(jwwang)
Posted patch bug.patch v3 (obsolete) — Splinter Review
Modify mutex scope.
Also changing the nsRefPtr<nsICancelableRunnable> usage in header file.
Attachment #8494295 - Attachment is obsolete: true
Attachment #8494418 - Flags: feedback?(jwwang)

Updated

5 years ago
Crash Signature: [@ libxul.so@0xc797c6 | libxul.so@0xc79e55 | libxul.so@0x358de1 | libmozglue.so@0x2a91d] → [@ libxul.so@0xc797c6 | libxul.so@0xc79e55 | libxul.so@0x358de1 | libmozglue.so@0x2a91d] [@ mozilla::MediaOmxReader::NotifyDataArrived(char const*, unsigned int, long long) ]
Summary: crash in libxul.so@0xc797c6 | libxul.so@0xc79e55 | libxul.so@0x358de1 | libmozglue.so@0x2a91d → Crash in mozilla::MediaOmxReader::NotifyDataArrived (ProcessCachedData runnable is running but reader has been destroyed)
Comment on attachment 8494418 [details] [diff] [review]
bug.patch v3

Review of attachment 8494418 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +50,5 @@
>      mOmxReader->ProcessCachedData(mOffset, false);
>    }
>  
>  private:
> +  MediaOmxReader* mOmxReader;

This should be an nsRefPtr to avoid mOmxReader blows up during execution.

@@ +69,5 @@
>  // a task to the IO thread for retrieving the next chunk of data, and
>  // the IO task dispatches a runnable to the main thread for parsing the
>  // data. This goes on until all of the MP3 file has been parsed.
>  
> +class OmxReaderNotifyDataArrivedRunnable : public nsICancelableRunnable

We have nsCancelableRunnable.

@@ +132,4 @@
>      }
>    }
>  
> +  MediaOmxReader* mOmxReader;

Use nsRefPtr here for Cancel() will release |mOmxReader|.

@@ +133,5 @@
>    }
>  
> +  MediaOmxReader* mOmxReader;
> +  // Protect mOmxReader flag, may access by decoder thread or main thread
> +  Mutex mMutex;

It is easy to cause dead lock when using 2 locks (one from OmxReaderNotifyDataArrivedRunnable and the other from MediaOmxReader). Lock order must be carefully taken. It is better to share the same lock between OmxReaderNotifyDataArrivedRunnable and MediaOmxReader to avoid the trouble.
Attachment #8494418 - Flags: feedback?(jwwang)
>>We have nsCancelableRunnable.
But I have to add some data members on that class, 
Directly overwrite the Run() or Cancel() function can't meet our requirement.
(In reply to Randy Lin [:rlin] from comment #21)
> >>We have nsCancelableRunnable.
> But I have to add some data members on that class, 
> Directly overwrite the Run() or Cancel() function can't meet our requirement.

Can't you subclass nsCancelableRunnable and add the members you need?
Posted patch bug.patch v4 (obsolete) — Splinter Review
Fix nits, 
I also let OmxReaderNotifyDataArrivedRunnable::Cancel() 
running in main thread to remove mutex usage in this class.
Attachment #8494418 - Attachment is obsolete: true
Attachment #8495782 - Flags: feedback?(jwwang)
Comment on attachment 8495782 [details] [diff] [review]
bug.patch v4

Review of attachment 8495782 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.cpp
@@ +102,5 @@
>      return NS_OK;
>    }
>  
>  private:
>    void NotifyDataArrived()

Since Cancel() happens on the main thread as this function, you can exit this function immediately if |mOmxReader| is null.

@@ +176,5 @@
>  }
>  
> +void MediaOmxReader::CancelProcessCachedData()
> +{
> +  if (mOmxReaderNotifyDataArrivedRunnable) {

You need to lock when accessing |mOmxReaderNotifyDataArrivedRunnable|.

@@ +186,5 @@
> +  MutexAutoLock lock(mMutex);
> +  mIsShutdown = true;
> +  nsCOMPtr<nsIRunnable> cancelEvent =
> +    NS_NewRunnableMethod(this, &MediaOmxReader::CancelProcessCachedData);
> +  NS_DispatchToMainThread(cancelEvent);

Since OmxReaderProcessCachedDataTask also holds a strong reference to MediaOmxReader, we should also cancel it to ensure the life cycle of MediaOmxReader is not extended and destroyed in an unexpected thread.

However, if it is fine to delete MediaOmxReader in an arbitrary thread, we don't need to cancel the runnables. We can just exit NotifyDataArrived() when mIsShutdown is true and this approach is much simpler.

http://hg.mozilla.org/mozilla-central/annotate/98a9383dd188/content/media/omx/OmxDecoder.cpp#l83
Refer to the previous revision. OmxDecoderProcessCachedDataTask did make sure OmxDecoder is deleted on the main thread. I think we should also ensure MediaOmxReader is deleted on the main thread.
Attachment #8495782 - Flags: feedback?(jwwang)
Posted patch patch v5 (obsolete) — Splinter Review
The MediaOmxReader is threadsafe reference counting class.
On this patch, I create a mIsShutdown flag and let it changed on main thread.
So that it only need one lock on the MediaOmxReader::ProcessCachedData. (IO/ decoder thread)
Attachment #8495782 - Attachment is obsolete: true
Comment on attachment 8497401 [details] [diff] [review]
patch v5

Review of attachment 8497401 [details] [diff] [review]:
-----------------------------------------------------------------

Overall looks good to me. Since OmxReaderProcessCachedDataTask holds a strong reference to the MediaOmxReader, is it ok for MediaOmxReader to be deleted in the IO thread?

::: content/media/omx/MediaOmxReader.cpp
@@ +96,4 @@
>    }
>  
>  private:
>    void NotifyDataArrived()

Since this function runs in the main thread, you can return immediately if |mOmxReader->IsShutdown()| is true. There is no way for |mOmxReader->IsShutdown()| to become true in the middle of the while loop.
Attachment #8497401 - Flags: feedback+
Posted patch patch v5.1 (obsolete) — Splinter Review
Thanks JW,
This patch modified the nits on NotifyDataArrived() function. 
As my test result by switching the song quickly, I didn't found any crashes.
Attachment #8497922 - Flags: review?(edwin)
Duplicate of this bug: 1075259
Blocks: 862140
Hi Edwin, 
Do you have time to review that? thanks.
Flags: needinfo?(edwin)
Comment on attachment 8497922 [details] [diff] [review]
patch v5.1

Review of attachment 8497922 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/MediaOmxReader.h
@@ +29,5 @@
>  class AbstractMediaDecoder;
>  
>  class MediaOmxReader : public MediaOmxCommonReader
>  {
> +  Mutex mMutex;

Add a comment here about which fields this mutex protects. Consider putting them next to each other.

@@ +92,5 @@
>    virtual void SetIdle() MOZ_OVERRIDE;
>  
>    virtual void Shutdown() MOZ_OVERRIDE;
>  
> +  bool IsShutdown() {

Take the mutex here instead of expecting callers to do so.
Attachment #8497922 - Flags: review?(edwin) → review+
Fix nits, carry reviewer,
try result
https://tbpl.mozilla.org/?tree=Try&rev=1faecf6fd012
Attachment #8497401 - Attachment is obsolete: true
Attachment #8497922 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/d5290c0ca86e
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Duplicate of this bug: 1071431

Comment 35

5 years ago
Issue verified as fixed on Flame 2.2.

Actual Result: Having more than one song saved to the SD card, and pressing the fast forward and rewind buttons repeatedly no longer incurs a crash.

Device: Flame 2.2 (319mb)(Kitkat Base)(Shallow Flash)
Build ID: 20141121040204
Gaia: 25388c6bce932657ebf93adedf31881bfaf88c15
Gecko: 3366c0fcf9c2
Version: 36.0a1 (2.2)
Firmware Version: v188-1
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.