Closed Bug 1321582 Opened 8 years ago Closed 7 years ago

REFTEST PROCESS-CRASH | layout/base/crashtests/1234622-1.html | application crashed [@ mozalloc_abort]

Categories

(Firefox for Android Graveyard :: Audio/Video, defect)

ARM
Android
defect
Not set
normal

Tracking

(fennec53+, firefox53 affected)

RESOLVED DUPLICATE of bug 1351370
Tracking Status
fennec 53+ ---
firefox53 --- affected

People

(Reporter: bc, Assigned: jwwang)

References

(Blocks 1 open bug)

Details

(Keywords: crash, hang, platform)

Crash Data

Testing Pixels on the unit tests https://treeherder.allizom.org/#/jobs?repo=mozilla-central&filter-searchStr=autophone&tochange=28eb27fd7d141a8f9f2a6dbfae892ad78ecb40a5&fromchange=05328d3102efd4d5fc0696489734d7771d24459f&selectedJob=5481777 

found the opt

REFTEST PROCESS-CRASH | http://192.168.1.50:36797/tests/layout/base/crashtests/1235467-1.html | application crashed [@ mozalloc_abort]

11-28 17:44:11.147 I/Gecko (32437): FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"MediaShutdownManager: shutdown","state":"(none)","filename":"/builds/slave/m-cen-and-api-15-0000000000000/build/src/dom/media/MediaShutdownManager.cpp","lineNumber":75,"stack":"MediaShutdownManager shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive proc
11-28 17:44:11.151 I/Gecko (32437): [32437] ###!!! ABORT: file /builds/slave/m-cen-and-api-15-0000000000000/build/src/dom/media/MediaShutdownManager.cpp, line 75
11-28 17:44:11.151 E/Gecko (32437): mozalloc_abort: [32437] ###!!! ABORT: file /builds/slave/m-cen-and-api-15-0000000000000/build/src/dom/media/MediaShutdownManager.cpp, line 75

https://treeherder.allizom.org/logviewer.html#?job_id=5480959&repo=mozilla-central#L6824

and debug

REFTEST PROCESS-CRASH | http://192.168.1.50:36441/tests/layout/base/crashtests/1234622-1.html | application crashed [@ mozalloc_abort]

https://treeherder.allizom.org/logviewer.html#?job_id=5481777&repo=mozilla-central#L6832

11-30 20:20:39.314 I/Gecko (30204): FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"MediaShutdownManager: shutdown","state":"(none)","filename":"/home/worker/workspace/build/src/dom/media/MediaShutdownManager.cpp","lineNumber":75,"stack":"MediaShutdownManager shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources
11-30 20:20:39.323 I/Gecko (30204): [30204] ###!!! ABORT: file /home/worker/workspace/build/src/dom/media/MediaShutdownManager.cpp, line 75
11-30 20:20:39.324 E/Gecko (30204): mozalloc_abort: [30204] ###!!! ABORT: file /home/worker/workspace/build/src/dom/media/MediaShutdownManager.cpp, line 75
https://treeherder.allizom.org/logviewer.html#?job_id=5480959&repo=mozilla-central#L6824

Thread 55
 0  libc.so + 0x17418
     r0 = 0xba6058f4    r1 = 0x00000089    r2 = 0x00000004    r3 = 0x00000000
     r4 = 0x00000000    r5 = 0xffffffff    r6 = 0x00000000    r7 = 0x000000f0
     r8 = 0xba285bf0    r9 = 0xbf87f978   r10 = 0x00000004   r12 = 0xbf87f678
     fp = 0x00000004    sp = 0xbf87f668    lr = 0xe71e1b8f    pc = 0xe71b2418
    Found by: given as instruction pointer in context
 1  libxul.so!__aeabi_fcmpgt + 0x81356d
     sp = 0xbf87f6a0    pc = 0xcb7bfd6c
    Found by: stack scanning
 2  libnss3.so!PR_WaitCondVar [ptsynch.c:05328d3102ef : 396 + 0x3]
     sp = 0xbf87f6b0    pc = 0xcb86c195
    Found by: stack scanning
 3  libxul.so!mozilla::MonitorAutoLock::Wait [CondVar.h:05328d3102ef : 79 + 0x5]
     r4 = 0x80004005    r5 = 0xbb417e58    r6 = 0xbf87f730    sp = 0xbf87f6c0
     pc = 0xc97ddd85
    Found by: call frame info
 4  libxul.so!mozilla::MediaCodecDataDecoder::Flush [MediaCodecDataDecoder.cpp:05328d3102ef : 774 + 0x7]
     r3 = 0x00000001    r4 = 0xbf87f6d4    r5 = 0xbb417de0    r6 = 0xbf87f730
     sp = 0xbf87f6d0    pc = 0xca18c589
    Found by: call frame info
 5  libxul.so!mozilla::MediaFormatReader::DecoderFactory::Data::~Data [MediaFormatReader.cpp:05328d3102ef : 221 + 0x3]
     r4 = 0xba635534    r5 = 0xba635520    r6 = 0xbf87f730    sp = 0xbf87f6e8
     pc = 0xca11eadd
    Found by: call frame info
 6  libxul.so!mozilla::MediaFormatReader::Shutdown [MediaFormatReader.cpp:05328d3102ef : 188 + 0x7]
     r4 = 0xbb26b000    r5 = 0xba635520    r6 = 0xbf87f730    sp = 0xbf87f6f0
     pc = 0xca1389ab
    Found by: call frame info

Any idea?
Flags: needinfo?(jolin)
There seems no call stack of 'MC Decoder' thread [1] in the log. Could it be that decoder was created but not initialized yet?

[1] http://searchfox.org/mozilla-central/source/dom/media/platforms/android/MediaCodecDataDecoder.cpp#404
Flags: needinfo?(jolin)
http://searchfox.org/mozilla-central/rev/0c055ccbcf96846044fc9a71421bd9b7978686f7/dom/media/MediaFormatReader.cpp#221

This line is added by bug 1316506 which regressed bug 1293572.

Why would we want to flush a decoder that is never used for decoding?
Flags: needinfo?(jyavenard)
Something to do to ensure that the OOP decoder won't dispatch new operations after it has been shutdown. 

We could I guess check if a decoder has been used or not, and if so only flush it as required.
Flags: needinfo?(jyavenard) → needinfo?(matt.woodrow)
When we shut down MFR, we block on the decoder task queues and wait for them to be completely empty.

This doesn't work for RemoteVideoDecoder since the work isn't happening on the decoder task queue, it's happening in a different process.

If we shut down MFR while there are still tasks (of any sort) running on the GPU process decode task queue, then we can get a task posted back to MediaFormatReader using the MediaDataDecoderCallback pointer, which isn't refcounted and could be dangling.

We fixed this by inserting a Flush before we destroy decoders since Flush is synchronous and should do the same as waiting on the task queue to be empty.

We also made RemoteVideoDecoder::Shutdown clear the mCallback pointer so that it's never dangling.

Some options for fixing this:

* Keep track of suspended state, and if we've ever tried to decode with a decoder, and only issue the Flush if there might be things to flush.

* Stop calling Flush and rely on dropping the pointer in Shutdown to be sufficient. This will discard any messages that come in after Shutdown, rather than flushing them, but that's probably ok.

* Fix the underlying issue where calling Flush() needlessly causes a hang.
Flags: needinfo?(matt.woodrow)
http://searchfox.org/mozilla-central/rev/0c055ccbcf96846044fc9a71421bd9b7978686f7/dom/media/MediaFormatReader.cpp#221
The decoder held by DecoderFactory is never used for decoding.

http://searchfox.org/mozilla-central/rev/0c055ccbcf96846044fc9a71421bd9b7978686f7/dom/media/MediaFormatReader.cpp#416
DecoderFactory won't pass the decoder to MFR until Init() promise is resolved. And decoding starts after that.

So a quick fix is remove #221.
Assignee: nobody → jwwang
tracking-fennec: ? → 53+
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #6)

> So a quick fix is remove #221.

but then you'll cause bug 1316506 again...
(In reply to Jean-Yves Avenard [:jya] from comment #7)
> (In reply to JW Wang [:jwwang] [:jw_wang] from comment #6)
> 
> > So a quick fix is remove #221.
> 
> but then you'll cause bug 1316506 again...

Do we always need to flush an OOP decoder before shutdown even if it hasn't started decoding at all?
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #8)
> (In reply to Jean-Yves Avenard [:jya] from comment #7)
> > (In reply to JW Wang [:jwwang] [:jw_wang] from comment #6)
> > 
> > > So a quick fix is remove #221.
> > 
> > but then you'll cause bug 1316506 again...
> 
> Do we always need to flush an OOP decoder before shutdown even if it hasn't
> started decoding at all?

:mattwoodrow would know more. At a guess I would say no.

Flush is a synchronous IPDL, unlike all the others that are asyncronous.

We don't want the OOP MediaDataDecoder to use the callback after it's been shutdown.
So I'd say it all depends on how the initialisation is handled with the OOP, and if that too is an asynchronous IPDL.
E.g. what happens if you call Init, don't wait for Init to complete and call shutdown.
Flags: needinfo?(matt.woodrow)
Sorry for the slow reply, no you don't. We shouldn't need to flush at all now, just make sure we call Shutdown.
Flags: needinfo?(matt.woodrow)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.