Crash in mozilla::MP4Demuxer::~MP4Demuxer

RESOLVED FIXED

Status

()

P1
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: njn, Assigned: jya)

Tracking

({crash})

Trunk
Unspecified
Mac OS X
crash
Points:
---

Firefox Tracking Flags

(firefox48 affected, firefox49 affected, firefox-esr45 affected, firefox50 affected, firefox51 affected)

Details

(crash signature)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-09b9bf6c-82f3-455b-8af8-424a32160516.
=============================================================

Crashes with this signature have occurred 102 times in the past week, all on Mac OS X, in versions ranging from FF 42--49.

Here's a search giving all crashes since May 10th:

https://crash-stats.mozilla.com/signature/?product=Firefox&platform=mac&date=%3E%3D2016-05-10&signature=mozilla%3A%3AMP4Demuxer%3A%3A~MP4Demuxer&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1

Looking at the more recent ones, some of them crash at address 0x0, and some crash at addresses like 0x101 or 0x10135.

Looking at the stack traces, the top frame sometimes points to the first line of the MP4Demuxer class, and sometimes to the  |NS_INLINE_DECL_THREADSAFE_REFCOUNTING(Stream);| line inside Stream. My guess is that something is going wrong with destruction and refcounting of this class.

jya, any ideas?
Flags: needinfo?(jyavenard)
(Assignee)

Comment 1

3 years ago
I've seen a few reports where whenever the crash occurred, the backtrace show that we have two threads accessing the mp4 object, (the rust demuxers always appear in the stack trace).

Now of course, it could be a coincidence and that there are effectively multiple mp4/MSE being used at once.

https://crash-stats.mozilla.com/report/index/09b9bf6c-82f3-455b-8af8-424a32160516#allthreads
https://crash-stats.mozilla.com/report/index/b0c97f5a-6aed-4cac-b9a3-abfa22160517#allthreads
https://crash-stats.mozilla.com/report/index/8594d478-de07-445b-91c2-1b09a2160516#allthreads (this one it's mp4parse_get_track_info)

it's just surprising that all the crashes in 49 I looked at, shows concurrent access to mozilla::MP4Demuxer::Init() on one and mozilla::MP4Demuxer::~MP4Demuxer() in the other.
Flags: needinfo?(jyavenard)
(Assignee)

Comment 2

3 years ago
JW, you're very good at finding weird stuff there... If you could have a look at this...
Flags: needinfo?(jwwang)
Priority: -- → P1
(Assignee)

Comment 3

3 years ago
Nathan, There's a long list of very weird crashes that has plagued media for a while, ones that never made any sense and only ever affect Mac.

I've never seen those type of crashes on Linux or Windows, despite that last one being by far where the majority of our users are.

The only logical explanation when I look at those crashes is that a data race occurred between two threads. But by design, this should never happen because everything run via a series of runnable launched sequentially on a single common TaskQueue

I've looked at the code over and over, and not ever is the above paradigm broken: we can't have by design data races.
And yet, those crashes are almost always two threads playing with the same objects, or ran out of order, always on Mac...

It should be impossible for a TaskQueue/ThreadPool to run more than one task at a time.

I noticed this comment in TaskQueue.cpp

https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TaskQueue.cpp#165

That we rely on unlocking/locking to apply memory barriers. 
Could there be that in Mac there's something slightly different with how monitor/memory barriers work?

It's a shot in the dark, but I can't think of anything else at this stage...
Flags: needinfo?(nfroyd)
(In reply to Jean-Yves Avenard [:jya] from comment #3)
> That we rely on unlocking/locking to apply memory barriers. 
> Could there be that in Mac there's something slightly different with how
> monitor/memory barriers work?

No.  TaskQueue relies on the locking primitives providing the guarantees that POSIX provides:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

while it's possible that OS X doesn't provide those guarantees, it's extremely unlikely and it'd be hard to write correct parallel programs without those guarantees.  (Note that those guarantees don't apply to the Windows locking primitives, but again, it's hard to imagine how general locking primitives can do their job without providing POSIX-like guarantees.)

One unlikely possibility is that locking the mutex is actually returning an error for some reason, which means that you could easily have multiple threads inside a critical section, since NSPR's locking primitives don't really do anything for errors.  But then again, if you look at the errors that mutex locking and unlocking can return:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html

virtually none of them can happen since we're not using recursive locks and I really hope we're not attempting to lock the mutex on the same thread multiple times...
Flags: needinfo?(nfroyd)
I strongly suspect bug 1281295 (TSan: data race involving ~TrackBuffersManager() -> ShutdownDemuxers() on main thread) here too!

E.g., in https://crash-stats.mozilla.com/report/index/c41a529e-8e2a-4d58-8b3c-a49ec2160611#allthreads, the main thread is in ~SourceBufferResource().
One of the members of TrackBuffersManager is 'RefPtr<SourceBufferResource> mCurrentInputBuffer'.
So I'm guessing TrackBuffersManager is being destroyed, including its mInputDemuxer (which should be destroyed right before mCurrentInputBuffer), which would explain the crash in ~MP4Demuxer in thread 39.

So I think we should get bug 1281295 fixed, and see how many other signatures benefit from it.
Depends on: 1281295
Actually, looking at https://crash-stats.mozilla.com/report/index/c41a529e-8e2a-4d58-8b3c-a49ec2160611#allthreads again, the runnable (that calls TrackBuffersManager::SegmentParserLoop() in the crashing thread) keeps an owning reference to the TrackBuffersManager, so I don't see how the last reference could have been released on the main thread.
Will need to investigate further...
No longer depends on: 1281295
Sorry: Bug 1281295 was a red herring, nothing to do with TrackBuffersManager.
Assignee: nobody → jyavenard
Crash volume for signature 'mozilla::MP4Demuxer::~MP4Demuxer':
 - nightly (version 51): 1 crash from 2016-08-01.
 - aurora  (version 50): 1 crash from 2016-08-01.
 - beta    (version 49): 8 crashes from 2016-08-02.
 - release (version 48): 53 crashes from 2016-07-25.
 - esr     (version 45): 21 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       1
 - aurora        0       0       0
 - beta          3       1       2
 - release      14       9       6
 - esr           1       1       1

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly
 - aurora            #905
 - beta    #4117
 - release #798
 - esr     #1758
status-firefox48: --- → affected
status-firefox50: --- → affected
status-firefox51: --- → affected
status-firefox-esr45: --- → affected
(Assignee)

Comment 9

2 years ago
No crashes since bug 1315631 went in... continue to monitor for one week, after that can close as duplicate (hopefully)
See Also: → bug 1315631
Per comment 9, keep monitoring.
Flags: needinfo?(jwwang)
You need to log in before you can comment on or make changes to this bug.