Closed Bug 900012 Opened 11 years ago Closed 11 years ago

crash in mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)

Categories

(Core :: Graphics: Layers, defect)

All
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla25

People

(Reporter: zcampbell, Assigned: nical)

References

Details

(4 keywords, Whiteboard: [b2g-crash][fromAutomation])

Crash Data

Attachments

(1 file, 1 obsolete file)

Just came in on today's mc engineering build:


Gecko  http://hg.mozilla.org/mozilla-central/rev/c2b375f3a909
Gaia  9bfceaa90e8b92a379432b67121afa3cd3f14c90
BuildID 20130731030205
Version 25.0a1


Steps to replicate:
1. Load camera app
2. Take a photo
3. Tap gallery app
4. Device will crashy crash
An additional crash report with a different STR

https://crash-stats.mozilla.com/report/index/e71dfc17-680c-4398-9ba2-1f4392130731

1. adb push videofile.mp4 sdcard/
2. Load videoplayer app
3. Wait for pushed video file to appear
4. Tap video file to play
5. Device says no.

Replicated manually.
Automation shows it crashes with other video formats too.
Whiteboard: [fromAutomation] → [b2g-crash][fromAutomation]
Component: General → Graphics: Layers
Product: Boot2Gecko → Core
Version: unspecified → Trunk
Joe - Any ideas on what caused this regression? We just saw this on today's m-c b2g build.
Flags: needinfo?(joe)
More simple steps:
1. Load camera app
2. Tap on the video icon to switch source
I don't, but perhaps Nical does.
Flags: needinfo?(joe) → needinfo?(nical.bugzilla)
Looks like this might be caused by the work going on in bug 858914.
(In reply to Jason Smith [:jsmith] from comment #5)
> Looks like this might be caused by the work going on in bug 858914.

Actually, Bug 858914 is aiming at fixing this kind of bugs.
So far, nothing of the work in 858914 that landed affects B2G (it is preffed off and I am still working on the B2G part of that bug).

Another bug that is worth looking at is Bug 879681 where we want to remove PGrallocBufferActor which is a source off so many problems. But it's not trivial and I have a lot to do with 858914 already so I wouldn't expect a patch for 879681 in the very short term (except if someone wants to take over this bug).
Depends on: 879681, 858914
Flags: needinfo?(nical.bugzilla)
What device?  Also, this implies that it just showed up in 2013-07-31, but can we confirm that it was ok on 2013-07-30 or not?
Flags: needinfo?(zcampbell)
Unagi device (this is in the crash report too)

This crash was definitely not in yesterday's 2013-07-30 master build. 
This showed up in our first test automation run on the 2013-07-31 build today and hit 5-6 of our tests pretty hard; it was quite visible :)
Flags: needinfo?(zcampbell)
So that makes the regression range between 7/30/2013 - 7/31/2013 on the gecko side.

I see that a large patch landed from graphics layers in bu 866232. I see a smaller patch that landed in bug 899730 as well.
Nical's bug 898914 being pref'd off non-withstanding, I'd still like to confirm that didn't break it. Benoit, can you do a quick test, Nical's probably gone for the day. It's this one: http://hg.mozilla.org/mozilla-central/rev/313445f455f3
Flags: needinfo?(bjacob)
(In reply to Milan Sreckovic [:milan] from comment #11)
> So,
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=3d40d270c031&tochange=c2b375f3a909 as the regression
> window?

Looks like it.
Issue looks similar to issue in bug 880780 + crash; after user launching camera app he see green screen before crash. 
Crash ID: bp-0309ef6f-4c90-4c17-9710-4a8c82130731
(In reply to Milan Sreckovic [:milan] from comment #12)
> Nical's bug 898914 being pref'd off non-withstanding, I'd still like to
> confirm that didn't break it. Benoit, can you do a quick test, Nical's
> probably gone for the day. It's this one:
> http://hg.mozilla.org/mozilla-central/rev/313445f455f3

Confirmed that this cset is responsible for this regression.

With 313445f455f3, camera app locks up (Camera process still visible in adb shell b2g-ps, though)

With the parent cset 91837985ae91, camera app works fine.

-> regressed by bug 858914
Flags: needinfo?(bjacob)
http://youtu.be/1Do6SXtKn9M video of issue happening.
Thanks Benoit. So can we backout the changeset in question here to get this working again?
(In reply to Jason Smith [:jsmith] from comment #17)
> Thanks Benoit. So can we backout the changeset in question here to get this
> working again?

RyanVM is currently working on backing out the changeset in question.
This is a huge changeset, I have no idea how hard it will be to back out. Its author is in Europe, too, so I wouldn't him to be available until tomorrow.
(In reply to Benoit Jacob [:bjacob] from comment #19)
> This is a huge changeset, I have no idea how hard it will be to back out.
> Its author is in Europe, too, so I wouldn't him to be available until
> tomorrow.

Yup. Discussed in IRC - nical is going to look into this first thing tomorrow his time.
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ nsXPConnect::XPConnect()]
The crash happens in some hacky code that was written in order to handle corner cases where the lifetime of Gralloc buffers is hard to track: When we set a gralloc buffer to a TextureHost we register this texture host to the the gralloc buffer so that the latter can ask one last thing (ForgetBuffer()) to the TextureHost before being destroyed.

It looks very much like the registered TextureHost is destroyed when the GrallocBufferActor's destructor tries to call forget buffer on it (dangling pointer crash).

bjacob you understand this code better than me so please let me know if I missed something.

In GrallocDeprecatedTextureHostOGL::SwapTextureImpl, we register to the gralloc buffer we just received, but we don't unregister from the previous one.

Also we don't seem to set the previous gralloc as the back buffer of the TextureHost. if we were doing any of those two i'd probably understand how the thing holds together but right now it's like we have always been lucky that the TextureHost would outlibe the GrallocBufferActor.

The funkiness here is that (I think) double buffering at the texture host level is only used in ImageLayers for async video. So subtle brokenness in GrallocDeprecatedTextureHost's double buffering will not break ThebesLayers (hich is the most exercised code path and probably where we were testing the register gralloc hack thingy).

Also Benoit, why do we need to call ForgetBuffer() in the GrallocBufferActor's destructor?
what harm is there to keep the android::GraphicBuffer alive a bit longer?
We are looking for a solution today (Aug 1); if not found, we will prepare a backout for tomorrow (Aug 2).
This patch reverts a one-line chunk that should not have been part of the patch part 4 in 858914, that made async-video single buffered. It fixes the crash, although I am not sure why. double buffering for async video was meant to avoid flashing on desktop.
Assignee: nobody → nical.bugzilla
Attachment #784606 - Flags: review?(matt.woodrow)
Comment on attachment 784606 [details] [diff] [review]
make async-video double buffered.

Review of attachment 784606 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPlatform.h
@@ +564,5 @@
>       */
>      bool PreferMemoryOverShmem() const;
> +    bool UseDeprecatedTextures() const {
> +      if (!mLayersUseDeprecated) MOZ_CRASH("SHOULD USE DEPRECATED TEXTURES!");
> +      return mLayersUseDeprecated;

Is this intentional? I'm hoping to have this enabled on OSX asap.
Attachment #784606 - Flags: review?(matt.woodrow) → review+
The patch without the debug assertion
Attachment #784606 - Attachment is obsolete: true
The image in the screen is still all messed up (green) on my unagi, though. Did we already have that before 858914? If I take pictures or videos, the image/video that is saved is correct, but not the image on the screen while I am filming.
Whiteboard: [b2g-crash][fromAutomation] → [b2g-crash][fromAutomation][leave-open]
(In reply to Nicolas Silva [:nical] from comment #27)
> The image in the screen is still all messed up (green) on my unagi, though.
> Did we already have that before 858914? If I take pictures or videos, the
> image/video that is saved is correct, but not the image on the screen while
> I am filming.

Yup. That's bug 880780.
With the patch applied, I also can't reproduce the crash described in comment 1.
Whiteboard: [b2g-crash][fromAutomation][leave-open] → [b2g-crash][fromAutomation]
https://hg.mozilla.org/mozilla-central/rev/0c2f8e48c612
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Works for me on Hamachi now.
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ nsXPConnect::XPConnect()] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ mozilla::docshell::POfflineCacheUpdateParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla::ipc::…
Keywords: topcrash
OK Scoobi, you are in luck, I have replicated it! Totally by coincidence I might add..

STR:
1. Push an image to the sdcard/gallery
2. Open contacts app
3. Tap the 'Add picture' to add a pic/avatar
4. Select 'From gallery'
5. Choose the image pushed in step 1
6. Crop the image a bit
7. Tap the tickbox
Device will crash.

Device: Unagi
Build:
Gecko  http://hg.mozilla.org/mozilla-central/rev/ad0ae007aa9e
Gaia  0ca0dba246d1372eb815772c8108395ab0abd0a4
BuildID 20130805070203
Version 25.0a1
The crash here happens in some very badly broken code that tries to stitch together even more broken code. It is in a fragile equilibrium and it will break again until we get to fix the architectural badness that got us in this situation. Ironically bug 858914 is largely motivated by fixing the broken ownership model of gralloc buffers (the code that crashed) and it also incidentally modified code that led this to regress (the modification has been backed out by the patch in this bug).

My point is: Ownership model of gralloc buffers is badly broken and IMO there are two things that can really help in the long run:
* Bug 858914 Get the new textures to work with gralloc (what i am working on at the moment) which will remove a portion of the brokenness.
* Bug 879681 remove GrallocBufferActor which turns a safe abstraction (android::GraphicBuffer) into an unsafe segfault machine.

In the short run we need to keep stitching but this crash can be caused by pretty much anything that touches the fragile equilibrim of DeprecatedGrallocTextureHostOGL and GrallocBufferActor
Flags: needinfo?(nical.bugzilla)
(In reply to Zac C (:zac) from comment #35)
> STR:
> 1. Push an image to the sdcard/gallery
> 2. Open contacts app
> 3. Tap the 'Add picture' to add a pic/avatar
> 4. Select 'From gallery'
> 5. Choose the image pushed in step 1
> 6. Crop the image a bit
> 7. Tap the tickbox
> Device will crash.
It seems bug 901705.
Yes same STR but different crash report.

I filed this one:
https://crash-stats.mozilla.com/report/index/50c5c6bd-97b6-454b-b5c0-834772130806
See Also: → 902927
We still can reproduce this with the STR described in  comment #35 

Gecko  http://hg.mozilla.org/mozilla-central/rev/fd4cf30428b0
Gaia  ae1905a5bf10f064a86731e2bff195c2bccf620f
BuildID 20130808040203
Version 26.0a1

Reopening this so we can xfail the test and track this crash
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Florin, file a new bug, likely the same underlying issue as bug 901705 and bug 902927 (see comment 36).
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
See Also: → 901705
See Also: → 903680
Attempted STR from Comment 0 and unable to reproduce reported result.

Using below Environmental Variables, device did not crash following STR on Buri device

Environmental Variables
Build ID: 20130830040204
Gecko: http://hg.mozilla.org/mozilla-central/rev/c7459bc8e449
Gaia: 407fbfb6a9de68ec4db2f0f3dc6c67463e293f47
Platform Version: 26.0a1
RIL Version: 01.01.00.019.206 
Base Image 20130823
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: