If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

crash in mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)

VERIFIED FIXED in mozilla25

Status

()

Core
Graphics: Layers
--
critical
VERIFIED FIXED
4 years ago
4 years ago

People

(Reporter: zac, Assigned: nical)

Tracking

(4 keywords)

Trunk
mozilla25
All
Android
crash, regression, reproducible, topcrash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [b2g-crash][fromAutomation], crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

4 years ago
Just came in on today's mc engineering build:


Gecko  http://hg.mozilla.org/mozilla-central/rev/c2b375f3a909
Gaia  9bfceaa90e8b92a379432b67121afa3cd3f14c90
BuildID 20130731030205
Version 25.0a1


Steps to replicate:
1. Load camera app
2. Take a photo
3. Tap gallery app
4. Device will crashy crash
(Reporter)

Comment 1

4 years ago
An additional crash report with a different STR

https://crash-stats.mozilla.com/report/index/e71dfc17-680c-4398-9ba2-1f4392130731

1. adb push videofile.mp4 sdcard/
2. Load videoplayer app
3. Wait for pushed video file to appear
4. Tap video file to play
5. Device says no.

Replicated manually.
Automation shows it crashes with other video formats too.

Updated

4 years ago
Whiteboard: [fromAutomation] → [b2g-crash][fromAutomation]

Updated

4 years ago
Component: General → Graphics: Layers
Product: Boot2Gecko → Core
Version: unspecified → Trunk

Updated

4 years ago
Keywords: regression, reproducible

Updated

4 years ago
Blocks: 884399
Joe - Any ideas on what caused this regression? We just saw this on today's m-c b2g build.
Flags: needinfo?(joe)
More simple steps:
1. Load camera app
2. Tap on the video icon to switch source
I don't, but perhaps Nical does.
Flags: needinfo?(joe) → needinfo?(nical.bugzilla)
Looks like this might be caused by the work going on in bug 858914.
(Assignee)

Comment 6

4 years ago
(In reply to Jason Smith [:jsmith] from comment #5)
> Looks like this might be caused by the work going on in bug 858914.

Actually, Bug 858914 is aiming at fixing this kind of bugs.
So far, nothing of the work in 858914 that landed affects B2G (it is preffed off and I am still working on the B2G part of that bug).

Another bug that is worth looking at is Bug 879681 where we want to remove PGrallocBufferActor which is a source off so many problems. But it's not trivial and I have a lot to do with 858914 already so I wouldn't expect a patch for 879681 in the very short term (except if someone wants to take over this bug).
Depends on: 879681, 858914
Flags: needinfo?(nical.bugzilla)
What device?  Also, this implies that it just showed up in 2013-07-31, but can we confirm that it was ok on 2013-07-30 or not?
Flags: needinfo?(zcampbell)
Keywords: regressionwindow-wanted
(Reporter)

Comment 8

4 years ago
Unagi device (this is in the crash report too)

This crash was definitely not in yesterday's 2013-07-30 master build. 
This showed up in our first test automation run on the 2013-07-31 build today and hit 5-6 of our tests pretty hard; it was quite visible :)
Flags: needinfo?(zcampbell)
So that makes the regression range between 7/30/2013 - 7/31/2013 on the gecko side.

I see that a large patch landed from graphics layers in bu 866232. I see a smaller patch that landed in bug 899730 as well.

Updated

4 years ago
Duplicate of this bug: 900076
So, http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d40d270c031&tochange=c2b375f3a909 as the regression window?
Nical's bug 898914 being pref'd off non-withstanding, I'd still like to confirm that didn't break it. Benoit, can you do a quick test, Nical's probably gone for the day. It's this one: http://hg.mozilla.org/mozilla-central/rev/313445f455f3
Flags: needinfo?(bjacob)
(In reply to Milan Sreckovic [:milan] from comment #11)
> So,
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=3d40d270c031&tochange=c2b375f3a909 as the regression
> window?

Looks like it.
Keywords: regressionwindow-wanted

Comment 14

4 years ago
Issue looks similar to issue in bug 880780 + crash; after user launching camera app he see green screen before crash. 
Crash ID: bp-0309ef6f-4c90-4c17-9710-4a8c82130731
(In reply to Milan Sreckovic [:milan] from comment #12)
> Nical's bug 898914 being pref'd off non-withstanding, I'd still like to
> confirm that didn't break it. Benoit, can you do a quick test, Nical's
> probably gone for the day. It's this one:
> http://hg.mozilla.org/mozilla-central/rev/313445f455f3

Confirmed that this cset is responsible for this regression.

With 313445f455f3, camera app locks up (Camera process still visible in adb shell b2g-ps, though)

With the parent cset 91837985ae91, camera app works fine.

-> regressed by bug 858914
Flags: needinfo?(bjacob)

Comment 16

4 years ago
http://youtu.be/1Do6SXtKn9M video of issue happening.
Thanks Benoit. So can we backout the changeset in question here to get this working again?
(In reply to Jason Smith [:jsmith] from comment #17)
> Thanks Benoit. So can we backout the changeset in question here to get this
> working again?

RyanVM is currently working on backing out the changeset in question.
This is a huge changeset, I have no idea how hard it will be to back out. Its author is in Europe, too, so I wouldn't him to be available until tomorrow.
(In reply to Benoit Jacob [:bjacob] from comment #19)
> This is a huge changeset, I have no idea how hard it will be to back out.
> Its author is in Europe, too, so I wouldn't him to be available until
> tomorrow.

Yup. Discussed in IRC - nical is going to look into this first thing tomorrow his time.

Updated

4 years ago
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ nsXPConnect::XPConnect()]
(Assignee)

Comment 21

4 years ago
The crash happens in some hacky code that was written in order to handle corner cases where the lifetime of Gralloc buffers is hard to track: When we set a gralloc buffer to a TextureHost we register this texture host to the the gralloc buffer so that the latter can ask one last thing (ForgetBuffer()) to the TextureHost before being destroyed.

It looks very much like the registered TextureHost is destroyed when the GrallocBufferActor's destructor tries to call forget buffer on it (dangling pointer crash).

bjacob you understand this code better than me so please let me know if I missed something.

In GrallocDeprecatedTextureHostOGL::SwapTextureImpl, we register to the gralloc buffer we just received, but we don't unregister from the previous one.

Also we don't seem to set the previous gralloc as the back buffer of the TextureHost. if we were doing any of those two i'd probably understand how the thing holds together but right now it's like we have always been lucky that the TextureHost would outlibe the GrallocBufferActor.

The funkiness here is that (I think) double buffering at the texture host level is only used in ImageLayers for async video. So subtle brokenness in GrallocDeprecatedTextureHost's double buffering will not break ThebesLayers (hich is the most exercised code path and probably where we were testing the register gralloc hack thingy).

Also Benoit, why do we need to call ForgetBuffer() in the GrallocBufferActor's destructor?
what harm is there to keep the android::GraphicBuffer alive a bit longer?
We are looking for a solution today (Aug 1); if not found, we will prepare a backout for tomorrow (Aug 2).
(Assignee)

Comment 23

4 years ago
Created attachment 784606 [details] [diff] [review]
make async-video double buffered.

This patch reverts a one-line chunk that should not have been part of the patch part 4 in 858914, that made async-video single buffered. It fixes the crash, although I am not sure why. double buffering for async video was meant to avoid flashing on desktop.
Assignee: nobody → nical.bugzilla
Attachment #784606 - Flags: review?(matt.woodrow)
Comment on attachment 784606 [details] [diff] [review]
make async-video double buffered.

Review of attachment 784606 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPlatform.h
@@ +564,5 @@
>       */
>      bool PreferMemoryOverShmem() const;
> +    bool UseDeprecatedTextures() const {
> +      if (!mLayersUseDeprecated) MOZ_CRASH("SHOULD USE DEPRECATED TEXTURES!");
> +      return mLayersUseDeprecated;

Is this intentional? I'm hoping to have this enabled on OSX asap.
Attachment #784606 - Flags: review?(matt.woodrow) → review+
(Assignee)

Comment 25

4 years ago
Created attachment 784614 [details] [diff] [review]
Make async-video double buffered. (carries mattwoodrow:r+)

The patch without the debug assertion
Attachment #784606 - Attachment is obsolete: true
(Assignee)

Comment 26

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c2f8e48c612
(Assignee)

Comment 27

4 years ago
The image in the screen is still all messed up (green) on my unagi, though. Did we already have that before 858914? If I take pictures or videos, the image/video that is saved is correct, but not the image on the screen while I am filming.
Whiteboard: [b2g-crash][fromAutomation] → [b2g-crash][fromAutomation][leave-open]
(In reply to Nicolas Silva [:nical] from comment #27)
> The image in the screen is still all messed up (green) on my unagi, though.
> Did we already have that before 858914? If I take pictures or videos, the
> image/video that is saved is correct, but not the image on the screen while
> I am filming.

Yup. That's bug 880780.
(Assignee)

Comment 29

4 years ago
With the patch applied, I also can't reproduce the crash described in comment 1.
Whiteboard: [b2g-crash][fromAutomation][leave-open] → [b2g-crash][fromAutomation]
Duplicate of this bug: 900710
https://hg.mozilla.org/mozilla-central/rev/0c2f8e48c612
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Works for me on Hamachi now.

Updated

4 years ago
Crash Signature: [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ nsXPConnect::XPConnect()] → [@ mozilla::layers::GrallocBufferActor::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::ActorDestroyReason)] [@ mozilla::docshell::POfflineCacheUpdateParent::DestroySubtree(mozilla::ipc::IProtocolManager<mozilla&hellip;
Keywords: topcrash
Duplicate of this bug: 901240

Comment 34

4 years ago
Are you sure it hasn't regressed since because I see crashes in 25.0a1/20130805 (see https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3AGrallocBufferActor%3A%3AActorDestroy%28mozilla%3A%3Aipc%3A%3AIProtocolManager%3Cmozilla%3A%3Aipc%3A%3ARPCChannel%3A%3ARPCListener%3E%3A%3AActorDestroyReason%29)?
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 35

4 years ago
OK Scoobi, you are in luck, I have replicated it! Totally by coincidence I might add..

STR:
1. Push an image to the sdcard/gallery
2. Open contacts app
3. Tap the 'Add picture' to add a pic/avatar
4. Select 'From gallery'
5. Choose the image pushed in step 1
6. Crop the image a bit
7. Tap the tickbox
Device will crash.

Device: Unagi
Build:
Gecko  http://hg.mozilla.org/mozilla-central/rev/ad0ae007aa9e
Gaia  0ca0dba246d1372eb815772c8108395ab0abd0a4
BuildID 20130805070203
Version 25.0a1
(Assignee)

Comment 36

4 years ago
The crash here happens in some very badly broken code that tries to stitch together even more broken code. It is in a fragile equilibrium and it will break again until we get to fix the architectural badness that got us in this situation. Ironically bug 858914 is largely motivated by fixing the broken ownership model of gralloc buffers (the code that crashed) and it also incidentally modified code that led this to regress (the modification has been backed out by the patch in this bug).

My point is: Ownership model of gralloc buffers is badly broken and IMO there are two things that can really help in the long run:
* Bug 858914 Get the new textures to work with gralloc (what i am working on at the moment) which will remove a portion of the brokenness.
* Bug 879681 remove GrallocBufferActor which turns a safe abstraction (android::GraphicBuffer) into an unsafe segfault machine.

In the short run we need to keep stitching but this crash can be caused by pretty much anything that touches the fragile equilibrim of DeprecatedGrallocTextureHostOGL and GrallocBufferActor
Flags: needinfo?(nical.bugzilla)

Comment 37

4 years ago
(In reply to Zac C (:zac) from comment #35)
> STR:
> 1. Push an image to the sdcard/gallery
> 2. Open contacts app
> 3. Tap the 'Add picture' to add a pic/avatar
> 4. Select 'From gallery'
> 5. Choose the image pushed in step 1
> 6. Crop the image a bit
> 7. Tap the tickbox
> Device will crash.
It seems bug 901705.
(Reporter)

Comment 38

4 years ago
Yes same STR but different crash report.

I filed this one:
https://crash-stats.mozilla.com/report/index/50c5c6bd-97b6-454b-b5c0-834772130806

Updated

4 years ago
See Also: → bug 902927
We still can reproduce this with the STR described in  comment #35 

Gecko  http://hg.mozilla.org/mozilla-central/rev/fd4cf30428b0
Gaia  ae1905a5bf10f064a86731e2bff195c2bccf620f
BuildID 20130808040203
Version 26.0a1

Reopening this so we can xfail the test and track this crash
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 40

4 years ago
Florin, file a new bug, likely the same underlying issue as bug 901705 and bug 902927 (see comment 36).
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
See Also: → bug 901705

Updated

4 years ago
See Also: → bug 903680
Attempted STR from Comment 0 and unable to reproduce reported result.

Using below Environmental Variables, device did not crash following STR on Buri device

Environmental Variables
Build ID: 20130830040204
Gecko: http://hg.mozilla.org/mozilla-central/rev/c7459bc8e449
Gaia: 407fbfb6a9de68ec4db2f0f3dc6c67463e293f47
Platform Version: 26.0a1
RIL Version: 01.01.00.019.206 
Base Image 20130823
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.