Closed Bug 1358241 Opened 3 years ago Closed 2 years ago

Crash in LibHandle::ReleaseDirectRef

Categories

(Firefox for Android :: General, defect, critical)

Unspecified
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: mccr8, Assigned: esawin)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-4b3fef83-8cfd-43b0-a70a-a71470170420.
=============================================================

I'm not really sure what component to put this in. 9 crashes on 3 installations, which isn't a lot, but it makes it fairly high volume for Android Nightly.

The stack looks like this:
  LibHandle::ReleaseDirectRef
  __wrap_dlclose
  webrtc::AudioDeviceModuleImpl::CreatePlatformSpecificObjects
  webrtc::AudioDeviceModuleImpl::Create
  webrtc::VoEBaseImpl::Init
  mozilla::MediaEngineWebRTC::EnumerateAudioDevices

These are hitting a MOZ_CRASH:
  MOZ_CRASH(pthread_mutex_unlock failed)
Any idea, Eugen? It looks like you added this code.
Flags: needinfo?(esawin)
The crash can only occur if the mutex is invalid.
I assume it has been destroyed because we're releasing the last reference, destructing the handle, which destroys the mutex and then try to unlock the now invalid mutex.

With this patch we avoid this by only unlocking the mutex if we have any references left.
Assignee: nobody → esawin
Flags: needinfo?(esawin)
Attachment #8860171 - Flags: review?(nchen)
Attachment #8860171 - Attachment is obsolete: true
Attachment #8860171 - Flags: review?(nchen)
Attachment #8860172 - Flags: review?(nchen)
Comment on attachment 8860172 [details] [diff] [review]
0001-Bug-1358241-1.0-Only-release-mutex-if-any-references.patch

Review of attachment 8860172 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure this will work. Basically `directRefCnt == 0` doesn't mean the object will get destroyed, because object destruction is controlled by `AtomicRefCounted` and its refcount. If `directRefCnt` goes to zero, but the object is still alive, some other code can call `AddDirectRef` again and be deadlocked because we never unlocked the mutex.

So one better fix is to unlock only if the `AtomicRefCounted` refcount is non-zero, but you need to modify `AtomicRefCounted::Release` to return whether the object is still alive or not.

However, I think it's a lot easier to just make `directRefCnt` atomic and not use a mutex. For the assertion, you can use the result from the decrement operation, to ensure the value you use is still atomic.
Attachment #8860172 - Flags: review?(nchen) → feedback+
Make direct ref count atomic and remove mutex.
Attachment #8860172 - Attachment is obsolete: true
Attachment #8861031 - Flags: review?(nchen)
dlopen/dlclose may be called from different threads (e.g. for media libraries) in which case we need to lock access around the lib handle caching.
Attachment #8861033 - Flags: review?(nchen)
Comment on attachment 8861031 [details] [diff] [review]
0001-Bug-1358241-1.1-Make-direct-library-reference-counte.patch

Review of attachment 8861031 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.h
@@ +156,1 @@
>      ++directRefCnt;

Move this to below AddRef()

@@ +163,4 @@
>     */
>    bool ReleaseDirectRef()
>    {
> +    if (!directRefCnt) {

Don't need this check here because it's racy. You should assert the result instead.

@@ +168,5 @@
>      }
> +    const MozRefCountType count = --directRefCnt;
> +    mozilla::external::AtomicRefCounted<LibHandle>::Release();
> +    MOZ_ASSERT(count <=
> +               mozilla::external::AtomicRefCounted<LibHandle>::refCount());

Assert that `count + 1 != 0`; otherwise the ref count was already 0.

@@ +169,5 @@
> +    const MozRefCountType count = --directRefCnt;
> +    mozilla::external::AtomicRefCounted<LibHandle>::Release();
> +    MOZ_ASSERT(count <=
> +               mozilla::external::AtomicRefCounted<LibHandle>::refCount());
> +    return count;

`!!count`
Attachment #8861031 - Flags: review?(nchen) → review+
Comment on attachment 8861033 [details] [diff] [review]
0002-Bug-1358241-2.0-Add-mutex-locking-around-the-library.patch

Review of attachment 8861033 [details] [diff] [review]:
-----------------------------------------------------------------

Don't you need the lock in `::stats` and `::Register` as well?

::: mozglue/linker/ElfLoader.h
@@ +488,4 @@
>    typedef std::vector<LibHandle *> LibHandleList;
>    LibHandleList handles;
>  
> +  mutable pthread_mutex_t handlesMutex;

Don't need `mutable`
Attachment #8861033 - Flags: review?(nchen) → review+
Pushed by esawin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/751455b663d0
[1.2] Make direct library reference counter atomic to avoid mutex locking issues. r=jchen
https://hg.mozilla.org/integration/mozilla-inbound/rev/b37b46c7f38f
[2.1] Add mutex locking around the library handles cache. r=jchen
https://hg.mozilla.org/mozilla-central/rev/751455b663d0
https://hg.mozilla.org/mozilla-central/rev/b37b46c7f38f
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Blocks: 1346042
You need to log in before you can comment on or make changes to this bug.