Closed Bug 834150 Opened 12 years ago Closed 12 years ago

crash in mozilla::layers::GrallocBufferActor::GetFrom

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

RESOLVED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: m1, Assigned: cjones)

Details

(Keywords: crash, Whiteboard: [b2g-crash][cr 433107] QARegressExclude)

Crash Data

Attachments

(4 files, 1 obsolete file)

This main process crash is seen during camcorder and video stability test.

This report is from AU 182.   The crash address varies (0x30,0x3f,0x45,0x7aa11d8) but always same frames.

Top frames (.extra file unhelpful)
----------------------------------
Crash reason:  SIGSEGV
Crash address: 0x30

Thread 92 (crashed)
 0  libxul.so!mozilla::layers::GrallocBufferActor::GetFrom [StrongPointer.h : 128 + 0x0]
     r4 = 0x44ce5b6c    r5 = 0x44ce5ba4    r6 = 0x44ce5d20    r7 = 0x4b3287c0
     r8 = 0x44ce5d78    r9 = 0x00000000   r10 = 0x00000120    fp = 0x000000f0
     sp = 0x44ce5b48    lr = 0x4123a75d    pc = 0x4123a082
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::layers::GrallocPlanarYCbCrImage::SetData [GrallocImages.cpp : 51 + 0x9]
     r4 = 0x4a354d40    r5 = 0x44ce5ba4    r6 = 0x44ce5d20    r7 = 0x4b3287c0
     r8 = 0x44ce5d78    r9 = 0x00000000   r10 = 0x00000120    fp = 0x000000f0
     sp = 0x44ce5b50    pc = 0x4123a75d
    Found by: call frame info
 2  libxul.so!VideoData::Create [nsBuiltinDecoderReader.cpp : 238 + 0x9]
     r4 = 0x4a354d40    r5 = 0x480665a4    r6 = 0x44ce5d20    r7 = 0x4b3287c0
     r8 = 0x44ce5d78    r9 = 0x00000000   r10 = 0x00000120    fp = 0x000000f0
     sp = 0x44ce5ba0    pc = 0x40e54107
    Found by: call frame info
 3  libxul.so!nsMediaOmxReader::DecodeVideoFrame [nsMediaOmxReader.cpp : 223 + 0x3d]
     r4 = 0x48066520    r5 = 0x00000000    r6 = 0x00000001    r7 = 0x00000000
     r8 = 0x00000000    r9 = 0x00000160   r10 = 0x00000120    fp = 0x000000f0
     sp = 0x44ce5c58    pc = 0x40e239fb
    Found by: call frame info
We're running camera capture *and* video playback in the main process? :(
Oh, sorry if I implied that.  Was just trying to be helpful in saying that this was a main process crash.  We're not /monkey/ing around with where apps normally run!
Sorry on my side too --- I certainly didn't mean to imply you guys changed the default behavior.  Was just lamenting that that's what it seems to be, according to your crash report.
Severity: normal → critical
Crash Signature: [@ mozilla::layers::GrallocBufferActor::GetFrom]
Keywords: crash
Whiteboard: [cr 433107] → [b2g-crash][cr 433107]
Assignee: nobody → mhabicher
blocking-b2g: tef? → tef+
If time is of the essence, someone with experience with the video decoder should take a look at this.
Assignee: mhabicher → nobody
Jeff, remember when you said you were able to help?  :)
Assignee: nobody → jmuizelaar
So the fact that we're getting different addresses here suggests that we are probably hitting the different cases in ::GetFrom(). I expect we might be able to catch this a little earlier with some appropriately placed assertions.
I've scanned all the test cases that reproduce this crash and they are all a potpourri of messing with camera/camcorder/video/music/etc.   Will try to see if we can get this narrow down somewhat at least to one app.   Turns out we been sitting on this crash for a while accidentally, :-(, but in previous instances we  were seeing this exact backtrace in a plugin-container but now we see it in b2g.  Has video decoding and/or camera recently been moved to the main process or something?
No, but I recall in another bug you mentioning tests of the lockscreen camera + playback of preview videos, which alas seems to still be in the main process.  Sigh.
I wonder if this could be triggered by OOM?

This patch at least makes GrallocBufferActor::GetFrom return without crashing. The caller in this stack, GrallocPlanarYCbCrImage::SetData, does check already for null so it seems likely we could continue without crashing somewhere else immediately.
Assignee: jmuizelaar → roc
Attachment #706253 - Flags: review?(jones.chris.g)
Kan-Ru might be able to help with this code.
Comment on attachment 706253 [details] [diff] [review]
wallpaper-ish patch

*shrug*
Attachment #706253 - Flags: review?(jones.chris.g) → review+
Just talked to test folks and they say it's pretty /easy/ to reproduce manually, and would share STR by morning PST after verifying it's still seen at the tip.
Alright, let's try Kan-Ru for analysis once we have STR :-).
Assignee: roc → kchen
Comment on attachment 706253 [details] [diff] [review]
wallpaper-ish patch

Review of attachment 706253 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
@@ +323,5 @@
>      gba = static_cast<GrallocBufferActor*>(child);
>    } else if (PGrallocBufferParent* parent = aDescriptor.bufferParent()) {
>      gba = static_cast<GrallocBufferActor*>(parent);
> +  } else {
> +    return sp<GraphicBuffer>(nullptr);

Probably worth asserting that this doesn't happen if we don't expect it to happen, even if we take this patch.
Here's what I got for STR:
1. Taking Pictures and recording few videos and play the recorded videos.
2. After playing the videos, trying to open the gallery and scroll for the pictures in gallery. 
3. Switch from the Gallery to camera and repeat step 1 and 2.

But apparently this was easier to hit manually with these STR in older builds.  Automation did catch this again on AU 185 though so looks like it's still lingering somewhere.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #15)
> But apparently this was easier to hit manually with these STR in older
> builds.  Automation did catch this again on AU 185 though so looks like it's
> still lingering somewhere.

Can you share that crash report?
Chris, we don't have that crash report yet. When I was following the STR it didn't crash for me but the gallery app UI went in weird state as you can see in this attachment. It doesn't come to foreground to occupy the whole screen instead it stays of that same size as in the pic and all the soft buttons are functional. May be different issue.
(Btw, you use the button combo HOME + POWER to take a real screenshot hot off the framebuffer, with proper rotation etc. ;) .)

A little hard to tell but this almost looks like the gallery is being show in the task switcher?  At any rate, yeah, looks like a separate bug.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18)
> (Btw, you use the button combo HOME + POWER to take a real screenshot hot
> off the framebuffer, with proper rotation etc. ;) .)

Great. Thanks for the tip.

> 
> A little hard to tell but this almost looks like the gallery is being show
> in the task switcher?  At any rate, yeah, looks like a separate bug.

Yes, it's trying to switch to the gallery app but doesn't finish doing it and gallery app get's stuck at the same size. Created bug 834875.
I dug through one of the logs we captured (1.2GB) and see this output consistently before the main process crashes.  I also see either camera or video log activity as well. 

 E memalloc: /dev/pmem: No more pmem available
 E msm7627a.gralloc: gralloc failed err=Out of memory
 W GraphicBufferAllocator: alloc(352, 288, 842094169, 00000133, ...) failed -12 (Out of memory)

Not sure why we are running gralloc memory in this scenario, maybe that's a separate bug
I cannot reproduce following this STR.

(In reply to Michael Vines [:m1] [:evilmachines] from comment #15)
> Here's what I got for STR:
> 1. Taking Pictures and recording few videos and play the recorded videos.

How to play the recorded videos here? Click the thumbnail on top of the viewfinder or switch to gallery?

> 2. After playing the videos, trying to open the gallery and scroll for the
> pictures in gallery. 
> 3. Switch from the Gallery to camera and repeat step 1 and 2.

I assume the switching method here is to use the card view?
We been able to narrow this down finally to a single video, 720p, that causes the video app to exhibit this crash stack.

The video itself is 44MB so it's been put in Dietrich's special location.  

Inder will take quick look as well to debug from the OMX end.
Flags: needinfo?(ikumar)
(oh, wallpaper-ish patch didn't help the situation just for fyi)
Took a look and it appears that google omx decoder is running into an issue:
W/OMXCodec(  409): Failed to set frame packing format on component
But, gecko is not handling this error case properly and resulting into a crash.
Flags: needinfo?(ikumar)
Hmm.. I couldn't see the thumbnail in the Video app and were running out of pmem very quickly after I opend the Video app.
That sounds like what I see as well with that .mp4 on the sd card.   Running out of pmem is expected because we haven't allocated enough for 720p.
Hold up --- gecko upstream cowardly won't even attempt to send 720p videos to the HW decoder.  Based on comment 24, it sounds like we have similar configs.

So we must be getting "lucky" with a video that the crappy SW decoder thinks it can decode.  Maybe we should just never bother touching that decoder.  I haven't seen it do anything useful yet.
Stop the illusion that we can actually fall back on the SW decoder.

Not tested; does this make the problems go away?
Attachment #707415 - Flags: feedback?(kchen)
Attachment #707415 - Flags: feedback?(ikumar)
This SW decoder seems to be out to get us.   I applied the patch and same result.   Then looking at the frame size reported by the SW decoder I found (w=352, h=288).   The clip is clearly 720p when played on desktop.  So either 
(1) first couple frames are < 720p 
(2) SW decoder is confused

Somewhat supporting the theory of #2 is comment 24.    Inder, might need to dig into the impl of this guy a little more to determine if it's (1) or (2)
OK, maybe we should use the HwDecodersOnly flag (whatever that was, I forget).
(kHardwareCodecsOnly)
Looks like |flags = kHardwareCodecsOnly;| gets us to the TEF+ win.   No crash, and the video doesn't show up in Video app list.
(Still not tested by me, but this is the code that m1 said should be working.)
Attachment #707415 - Attachment is obsolete: true
Attachment #707415 - Flags: feedback?(kchen)
Attachment #707415 - Flags: feedback?(ikumar)
Attachment #707479 - Flags: review?
Attachment #707479 - Flags: feedback?
Comment on attachment 707479 [details] [diff] [review]
Just Say No to the fallback SW decoder, v2

Thanks, bz!
Attachment #707479 - Flags: review?(chris.double)
Attachment #707479 - Flags: review?
Attachment #707479 - Flags: feedback?(mvines)
Attachment #707479 - Flags: feedback?
Comment on attachment 707479 [details] [diff] [review]
Just Say No to the fallback SW decoder, v2

Yep, that does the trick.
Attachment #707479 - Flags: feedback?(mvines) → feedback+
Marking status-b2g18 and status-b2g18-v1.0.0 as affected, please update the status to fixed once this is verified landed on v1-train and v1.0.0
Comment on attachment 707479 [details] [diff] [review]
Just Say No to the fallback SW decoder, v2

(-> shotgun review in case doublec is OOO.)
Attachment #707479 - Flags: review?(kinetik)
Attachment #707479 - Flags: review?(edwin)
Comment on attachment 707479 [details] [diff] [review]
Just Say No to the fallback SW decoder, v2

Review of attachment 707479 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/OmxDecoder.cpp
@@ +232,5 @@
>    if (videoTrackIndex != -1 && (videoTrack = extractor->getTrack(videoTrackIndex)) != nullptr) {
> +    // Experience with OMX codecs is that only the HW decoders are
> +    // worth bothering with, at least on the platforms where this code
> +    // is currently used, and for formats this code is currently used
> +    // for (h.264).  So if we don't gett a hardware decoder, just give

Spelling of 'gett'.
Attachment #707479 - Flags: review?(kinetik)
Attachment #707479 - Flags: review?(edwin)
Attachment #707479 - Flags: review?(chris.double)
Attachment #707479 - Flags: review+
reassign to cjones :)
Assignee: kchen → jones.chris.g
https://hg.mozilla.org/mozilla-central/rev/dedbcc774413
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → B2G C4 (2jan on)
does not make sense to create a regression issue.
Flags: in-moztrap-
Whiteboard: [b2g-crash][cr 433107] → [b2g-crash][cr 433107] QARegressExclude
Cannot verify, need steps to blackbox test the issue.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: