Closed Bug 791167 Opened 7 years ago Closed 7 years ago

SIGSEGV when decoding H.264 videos using hardware codec

Categories

(Core :: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla18
blocking-basecamp +

People

(Reporter: cajbir, Assigned: cjones)

Details

Attachments

(1 file, 1 obsolete file)

Requires fix to Bug 791164 first to reproduce since without that the hardware decoder is not used.

Steps to reproduce:

1) Start browser
2) Visit http://cd.pn/b2
3) Press Play

Expected result:

Video plays

Actual result:

Browser tab crashes and shows sad face. logcat shows:

D/memalloc( 1225): /dev/pmem: Allocated buffer base:0x4ad00000 size:348160 offset:3944448 fd:178
D/memalloc( 1349): /dev/pmem: Mapped buffer base:0x46465000 size:4292608 offset:3944448 fd:57
F/libc    ( 1349): Fatal signal 11 (SIGSEGV) at 0x4687d000 (code=1)

Commenting out the code in nsBuiltDecoderReader.cpp that is special cased for the YV12 format like so works around the issue:

ImageFormat format[2] = {PLANAR_YCBCR, GRALLOC_PLANAR_YCBCR};
#if 0
  if (IsYV12Format(Y, Cb, Cr)) {
    v->mImage = aContainer->CreateImage(format, 2); 
  } else {   
#endif 
    v->mImage = aContainer->CreateImage(format, 1);
#if 0
  }
#endif

The special case code seems to have been introduced in bug 767480.
Depends on: 791164
blocking-basecamp: --- → ?
Blocks: 765977
blocking-basecamp: ? → +
Over to doublec for re-assignment :)
Assignee: nobody → chris.double
Attached patch Workaround (obsolete) — Splinter Review
This is my temporary workaround to get H264 video to play in the video app using the hardware decoder. With this applied and the workaround in bug 791164 applied I get http://cd.pn/b2/story.mp4 playing from the video app when I copy it to the /sdcard/Movies directory.
No longer blocks: 765977
No longer depends on: 791164
Updating to the latest repo code and this crash doesn't occur with bug 759506 applied. It only occurs when using the omx media plugins backend.
That means that we'll regress SW-decoded playback with that patch, right?
That patch isn't a fix, it's just to show where the crash happens and commenting it out stops it from happening. I'm not sure what you mean by it regressing software decoder playback.

Kanru would be able to comment on what his fast path actually does.
That work enables using gralloc as our transport for decoded frames so we don't have to reupload them on the compositor.  If the bug 759506 has a fork of that code from before bug 767480, then I suspect it might be missing the bug 767480 optimization.

Or, if you're saying that you guys took the latest omx plugin code and then re-applied edwin's transformations on it to internalize it, then maybe the HW-decoded path is skipping Kan-Ru's new code and we'll only use it for CPU-decoded videos.  In which case, all is well :).
Bug 759506 is not hitting the crash with this bug. It's the media/omx-plugin code that is hitting the crash. So with edwin's code applied we don't have a problem. Which means all is well in b2g land I think.
We had a quick IRC powwow.  It looks like the code in bug 767480 added repacking code to make buffers published by our internal decoders directly texturable through gralloc.  What's probably happening is that the omx buffers don't adhere to the preconditions needed for the repacking, and we're probably reading out-of-bounds memory.  Something like bug 785441.

I'm ok with temporarily disabling this code to get omx working+landed, but reenabling it is a blocker; vp8 and theora are borderline unplayable without bug 767480.
Forgot to add --- step one here is to run this code in a debug build.  If an assertion doesn't loudly blow up, we need to add that assertion.
I forgot to attach gdb when repro'ing, but in logcat I see

F/libc    (28965): Fatal signal 11 (SIGSEGV) at 0x46a15000 (code=1)

I/DEBUG   (  109): memory map around addr 46a15000:
I/DEBUG   (  109): 4677d000-46a15000 /dev/pmem
I/DEBUG   (  109): (no map for address)
I/DEBUG   (  109): 46ab5000-46ab6000 

which means we're attempting to read or write past the end of a gralloc buffer.
(gdb) bt
#0  memcpy () at bionic/libc/arch-arm/bionic/memcpy.S:225
#1  0x41b8f66a in mozilla::layers::GrallocPlanarYCbCrImage::SetData (this=0x44d49fc0, aData=...) at /home/cjones/mozilla/inbound/gfx/layers/GrallocImages.cpp:82
#2  0x4110272e in VideoData::Create (aInfo=..., aContainer=0x44fea330, aOffset=163147, aTime=0, aEndTime=1, aBuffer=..., aKeyframe=true, aTimecode=-1, aPicture=...) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderReader.cpp:237
#3  0x41129ad8 in nsMediaPluginReader::DecodeVideoFrame (this=0x44df5500, aKeyframeSkip=@0x44838c4f, aTimeThreshold=0) at /home/cjones/mozilla/inbound/content/media/plugins/nsMediaPluginReader.cpp:204
#4  0x41101d1a in nsBuiltinDecoderReader::DecodeVideoFrame (this=0x44df5500) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderReader.h:507
#5  0x41103724 in nsBuiltinDecoderReader::DecodeToFirstData<VideoData> (this=0x44df5500, aDecodeFn=(bool (nsBuiltinDecoderReader::*)(nsBuiltinDecoderReader *)) 0x41101cf1 <nsBuiltinDecoderReader::DecodeVideoFrame()>, aQueue=...) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderReader.h:497
#6  0x41102a4e in nsBuiltinDecoderReader::FindStartTime (this=0x44df5500, aOutStartTime=@0x44838cf8) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderReader.cpp:289
#7  0x410ff3de in nsBuiltinDecoderStateMachine::FindStartTime (this=0x4377eef0) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderStateMachine.cpp:2345
#8  0x410fd040 in nsBuiltinDecoderStateMachine::DecodeMetadata (this=0x4377eef0) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderStateMachine.cpp:1742
#9  0x410f7cd2 in nsBuiltinDecoderStateMachine::DecodeThreadRun (this=0x4377eef0) at /home/cjones/mozilla/inbound/content/media/nsBuiltinDecoderStateMachine.cpp:479
#10 0x404d9eb6 in nsRunnableMethodImpl<void (nsUpdateProcessor::*)(), true>::Run (this=0x48fe8640) at ../../dist/include/nsThreadUtils.h:349
#11 0x41a4d61a in nsThread::ProcessNextEvent (this=0x452f3be0, mayWait=true, result=0x44838e87) at /home/cjones/mozilla/inbound/xpcom/threads/nsThread.cpp:624
#12 0x419efd52 in NS_ProcessNextEvent_P (thread=0x452f3be0, mayWait=true) at /home/cjones/mozilla/new-b2g/objdir-gecko/xpcom/build/nsThreadUtils.cpp:220
#13 0x41a4c78a in nsThread::ThreadFunc (arg=0x452f3be0) at /home/cjones/mozilla/inbound/xpcom/threads/nsThread.cpp:257
#14 0x401d3426 in _pt_root (arg=0x452f3c50) at /home/cjones/mozilla/inbound/nsprpub/pr/src/pthreads/ptthread.c:156
#15 0x400eb0ec in __thread_entry (func=0x401d3355 <_pt_root>, arg=0x452f3c50, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#16 0x400eac40 in pthread_create (thread_out=<value optimized out>, attr=0x493ffb18, start_routine=0x401d3355 <_pt_root>, arg=0x452f3c50) at bionic/libc/bionic/pthread.c:357
#17 0x00000000 in ?? ()
(gdb) f 1
#1  0x41b8f66a in mozilla::layers::GrallocPlanarYCbCrImage::SetData (this=0x44d49fc0, aData=...) at /home/cjones/mozilla/inbound/gfx/layers/GrallocImages.cpp:82
(gdb) p mData
$1 = {
  mYChannel = 0x4a3f7000 <Address 0x4a3f7000 out of bounds>, 
  mYStride = 640, 
  mYSize = {
    <mozilla::gfx::BaseSize<int, nsIntSize>> = {
      width = 640, 
      height = 360
    }, <No data fields>}, 
  mYSkip = 0, 
  mCbChannel = 0x4a433001 <Address 0x4a433001 out of bounds>, 
  mCrChannel = 0x4a433000 <Address 0x4a433000 out of bounds>, 
  mCbCrStride = 640, 
  mCbCrSize = {
    <mozilla::gfx::BaseSize<int, nsIntSize>> = {
      width = 320, 
      height = 180
    }, <No data fields>}, 
  mCbSkip = 1, 
  mCrSkip = 1, 
  mPicX = 0, 
  mPicY = 0, 
  mPicSize = {
    <mozilla::gfx::BaseSize<int, nsIntSize>> = {
      width = 640, 
      height = 360
    }, <No data fields>}, 
  mStereoMode = mozilla::STEREO_MODE_MONO
}

/proc/[]/maps says that these addresses are

4a177000-4a8a2000 rw-s 0d300000 00:0b 906        /dev/pmem_adsp

Poking around to see what's out of sync here ...
I haven't been able to determine the underlying problem yet, but I suspect it will boil down to gecko having a different usage of "stride" than gralloc.  gralloc means "pixel stride", but gecko means "byte stride".

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#308
Found it.  I think.
Assignee: chris.double → jones.chris.g
Component: General → Video/Audio
Product: Boot2Gecko → Core
Comment on attachment 662097 [details] [diff] [review]
stride*height is the size of the region

Review of attachment 662097 [details] [diff] [review]:
-----------------------------------------------------------------

Ouch!
Attachment #662097 - Flags: review?(roc) → review+
https://hg.mozilla.org/mozilla-central/rev/5560ff65b090
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Keywords: verifyme
QA Contact: jsmith
Keywords: verifyme
QA Contact: jsmith
You need to log in before you can comment on or make changes to this bug.