Closed
Bug 915869
Opened 11 years ago
Closed 11 years ago
Firefox OS crash in mozalloc_abort(char const*) | NS_DebugBreak | mozilla::layers::PGrallocBufferParent::Write(mozilla::layers::PGrallocBufferParent*, IPC::Message*, bool)
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox25 | --- | wontfix |
firefox26 | --- | fixed |
firefox27 | --- | verified |
firefox-esr24 | --- | unaffected |
b2g-v1.2 | --- | verified |
People
(Reporter: marcia, Assigned: bjacob)
References
Details
(4 keywords, Whiteboard: [burirun1][osrestartcrash][b2g-crash])
Crash Data
Attachments
(5 files, 1 obsolete file)
1.20 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
20.20 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
8.69 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
25.13 KB,
text/plain
|
Details | |
84.00 KB,
text/plain
|
Details |
This bug was filed from the Socorro interface and is
report bp-bdf38910-9c8f-4c34-b14d-b50e52130912.
=============================================================
Seen multiple times while testing Buri device today. Seen while changing wallpaper and operating in Gallery. More crashes here: https://crash-stats.mozilla.com/report/list?product=B2G&signature=mozalloc_abort%28char%20const*%29%20|%20NS_DebugBreak%20|%20mozilla::layers::PGrallocBufferParent::Write%28mozilla::layers::PGrallocBufferParent*,%20IPC::Message*,%20bool%29
Buri Device using:
Gaia 9ffd2899eb91388f7fc1ce6f7a895a6f5f922c05
SourceStamp a98569f21abe
BuildID 20130912040201
Version 26.0a1
STR:
1. Long press on homescreen to change wallpaper.
2. Select a picture from Wallpaper
3. Device crashes.
This is not 100% reproducible but the crash has been seen during other operations as well.
Updated•11 years ago
|
Updated•11 years ago
|
Keywords: regressionwindow-wanted
Reporter | ||
Comment 2•11 years ago
|
||
All the crashes are occurring with 20130912040201 Build ID. Yesterday's build had graphics crashes as well, but not in this same stack. Might be hard to get a regression window unless we can reproduce reliably, which right now I cannot.
Updated•11 years ago
|
Whiteboard: burirun1
Updated•11 years ago
|
Whiteboard: burirun1 → [burirun1][osrestartcrash]
Comment 3•11 years ago
|
||
I can consistently reproduce this by visiting https://bugzilla.mozilla.org/show_bug.cgi?id=901214 in the browser, navigating around, and reloading.
Updated•11 years ago
|
Keywords: reproducible
Comment 4•11 years ago
|
||
FWIW, I can reproduce this on 20130913100534. Calling this out in burirun1 announcement to ensure visibility.
Comment 5•11 years ago
|
||
Does anybody testing this has the ability to change a single pref and re-run? If so, could you tell us if you can still reproduce after changing layers.force-tiles to false? This preference default is being changed today, so tonight's build will have that value anyway, but wanted to get a head start.
Updated•11 years ago
|
Assignee: nobody → milan
blocking-b2g: koi? → koi+
Assignee | ||
Comment 6•11 years ago
|
||
Not reproducible in emulator, with either value of the layers.force-tiles preference. Hwc related?
Assignee | ||
Comment 7•11 years ago
|
||
Still can't reproduce in emulator (after much usage).
But the stack looks like the _converse_ of bug 912725.
In bug 912725 we had a TextureHost that was failing to notify the GrallocBufferActor when it goes away. So the GrallocBufferActor was keeping a dangling pointer to the dead TextureHost.
Here we have the opposite. Some GrallocBufferActor failed to inform a TextureHost that it went away. So the TextureHost crashes as it kept a dangling pointer to the dead GrallocBufferActor.
Assignee | ||
Comment 8•11 years ago
|
||
Looking at code, there are two ways that we could get into this situation:
1) (Unlikely) The GrallocBufferActor was destroyed but ActorDestroy() was never called on it.
2) (More likely) Some TextureHost had its mBuffer pointer pointing to that GrallocBufferActor, but that was manually set, not set by SetBuffer() which does the right thing to inform the GrallocBufferActor of the TextureHost referencing it.
Assignee | ||
Comment 9•11 years ago
|
||
Grepping through code, it appears that the patch in bug 912725 fixed the last occurrence of issue 2) in comment 8, so the cause of the bug must be different.
Here is a scenario that might conceivably be what's happening (hey, I still can't reproduce locally, so I have to make guesses).
Suppose we had a TextureHost, let's call it TextureHost1, and a GrallocBufferActor normally referencing each other:
TextureHost1 <------> GrallocBufferActor
Now suppose that another TextureHost, let's call it TextureHost2, is being passed that same GrallocBufferActor (could conceivably happen with a DeprecatedImageHostBuffered's way of doing double buffering). So we now have TextureHost2 and GrallocBufferActor referencing each other, but the TextureHost1 was never notified about it, so the graph of references now is
TextureHost1 ------> GrallocBufferActor <-------> TextureHost2
So when the GrallocBufferActor goes away, as it only remembers about TextureHost2 and not about TextureHost1, it only notifies the TextureHost2 to drop its reference to it:
TextureHost1 ------> DEAD GrallocBufferActor TextureHost2
So we are left with a TextureHost1 referencing a DEAD GrallocBufferActor. When TextureHost1 is destroyed, it will send a __delete__ message to the dead GrallocBufferActor, giving a crash with the same stack as we're observing here.
Assignee | ||
Comment 10•11 years ago
|
||
Yup, this looks like an omission on my part when I wrote that very hacky code trying to work around the tragedy that is GrallocBufferActor.
See comment 9 for the theory that this is implementing.
Attachment #804596 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 11•11 years ago
|
||
...there was a bug in my previous patch (which shows how tricky this all it).
In case SetDeprecatedTextureHost is passed the same DeprecatedTextureHost as we already have, we don't want to tell it to forget about us!
Attachment #804596 -
Attachment is obsolete: true
Attachment #804596 -
Flags: review?(jmuizelaar)
Attachment #804598 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 12•11 years ago
|
||
Comment on attachment 804598 [details] [diff] [review]
Let the GrallocBufferActor inform its old TextureHost when it's switching to another TextureHost
Cancelling reviews for now, because this crashes. The problem is that ForgetBuffer tells the TextureHost to delete mBuffer. But as (that is the whole point of this bug and patch) multiple TextureHosts can be dealing with the same buffer, and mBuffer is a pointer so that currently two TextureHosts can be referencing the same SurfaceDescriptor... so if one deletes it, the other crashes.
This additional level of indirection is apparently useless anyway, so let's remove it.
Attachment #804598 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 14•11 years ago
|
||
This patch changes the mBuffer field of DeprecatedTextureHost to be a SurfaceDescriptor stored by value, instead of a SurfaceDescriptor* pointer.
That third level of indirection (a SurfaceDescriptor is already a GrallocBufferActor*, and a GrallocBufferActor points to a android::GraphicBuffer...) seems useless, right?
I need to do this, because the problem with my other patch on this bug is that two DeprecatedTextureHosts may be pointing to the same SurfaceDescriptor, so that if one gets notified when the GrallocBufferActor goes away, and ForgetBuffer() is called, doing | delete mBuffer; |, the other TextureHost still has its mBuffer pointing to that now-dead SurfaceDescriptor. So it makes my life a lot easier if DeprecatedTextureHosts each store their own SurfaceDescriptor by value.
Is that OK?
Attachment #804805 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 15•11 years ago
|
||
Comment on attachment 804598 [details] [diff] [review]
Let the GrallocBufferActor inform its old TextureHost when it's switching to another TextureHost
Review of attachment 804598 [details] [diff] [review]:
-----------------------------------------------------------------
Alright, time to re-request review. With the other patch here, ensuring that no two DeprecatedTextureHosts point refer to the same descriptor, this patch now seems to work fine.
I still couldn't reproduce the bug originally reported here (not even now that I have a hamachi) so I can't be certain that it will be fixed by these patches, but regardless, these patches are needed IMO.
Attachment #804598 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•11 years ago
|
Assignee: milan → bjacob
Assignee | ||
Updated•11 years ago
|
Attachment #804598 -
Attachment description: SetDeprecatedTextureHost → Let the GrallocBufferActor inform its old TextureHost when it's switching to another TextureHost
Assignee | ||
Updated•11 years ago
|
Attachment #804805 -
Attachment description: mbuffer-not-pointer → DeprecatedTextureHost should store its SurfaceDescriptor by value, and not share it with anything else
Comment 16•11 years ago
|
||
Do you need a regression window here? Looks like you've got patches for a fix, so I'm wondering if we still need to try to get a regression window.
Assignee | ||
Comment 17•11 years ago
|
||
Good question. I think that at this point, what I would like the most to know is if you can still reproduce this crash with a build using today's mozilla-central (so, any build labeled 20130914 or later), as I've been unable to reproduce (despite identifying various issues in the source code that would explain these crashes, whence the patches).
Updated•11 years ago
|
Attachment #804598 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 18•11 years ago
|
||
Comment 19•11 years ago
|
||
Comment on attachment 804805 [details] [diff] [review]
DeprecatedTextureHost should store its SurfaceDescriptor by value, and not share it with anything else
Review of attachment 804805 [details] [diff] [review]:
-----------------------------------------------------------------
Using a pointer for mBuffer was meant to let us reduce header inclusion but if it caused confusion the header thing is not that important here (especially now that we want to kill DeprecatedTextureHost anyway).
Attachment #804805 -
Flags: review?(nical.bugzilla) → review+
Reporter | ||
Comment 20•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #17)
> Good question. I think that at this point, what I would like the most to
> know is if you can still reproduce this crash with a build using today's
> mozilla-central (so, any build labeled 20130914 or later), as I've been
> unable to reproduce (despite identifying various issues in the source code
> that would explain these crashes, whence the patches).
I just crashed in this stack using the following Buri build:
Gaia a0079597d510ce8ea0b9cbb02c506030510b9eeb
SourceStamp c4bcef90cef9
BuildID 20130916040205
Version 26.0a1
I had just done something in camera when the crash occurred.
Assignee | ||
Comment 21•11 years ago
|
||
Rebased, new try push, hopefully the oranges were another bug that's no longer here...
https://tbpl.mozilla.org/?tree=Try&rev=38f99001e298
Assignee | ||
Comment 22•11 years ago
|
||
Meh, tested locally, got about:memory reports from reftests running in emulator, my patch is leaking gralloc memory like a sieve... back to drawing board.
Updated•11 years ago
|
Whiteboard: [burirun1][osrestartcrash] → [burirun1][osrestartcrash][b2g-crash]
Assignee | ||
Comment 25•11 years ago
|
||
My attempts so far were working with the assumption/hope that no two TextureHosts would want to reference the same GrallocBufferActor at the same time, but that was very tricky in presence of double buffering using 2 texturehosts.
This is a different approach and I believe it is simpler and doesn't make such assumptions. It passes reftest locally and everything seems to work fine in the emulator (browsing and various gaia apps) and I looked in about:memory, it doesn't seem to leak.
https://tbpl.mozilla.org/?tree=Try&rev=297baa2d8464
Attachment #805997 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 26•11 years ago
|
||
Note the nsTAutoArray with built-in size 2... the assumption here is that a GBA will rarely be referenced by more than 2 THosts, is that an OK assumption? If there are more THosts than expected, the array will become heap-allocated (slower).
Comment 27•11 years ago
|
||
Comment on attachment 805997 [details] [diff] [review]
Let the GrallocBufferActor keep an _array_ of backrefs to TextureHosts referencing it
Review of attachment 805997 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +1209,5 @@
> {
> MOZ_ASSERT(!mBuffer, "Will leak the old mBuffer");
> +
> + if (aBuffer != mBuffer) {
> + // only done for hacky fix in gecko 23 for bug 862324.
lame nit: you can also remove "gecko 23" here
Attachment #805997 -
Flags: review?(nical.bugzilla) → review+
Comment 28•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #26)
> Note the nsTAutoArray with built-in size 2... the assumption here is that a
> GBA will rarely be referenced by more than 2 THosts, is that an OK
> assumption? If there are more THosts than expected, the array will become
> heap-allocated (slower).
It looks like a fair assumption, and I am pretty sure it will not make a difference if the assumption turns out to be wrong.
Comment 29•11 years ago
|
||
I think I understand how to reproduce the crash on master hamachi.
-[1] Open 3-4 tabs on the browser app.
-[2] open relatively complex site in each tabs.
-[3] refresh each tab's content one by one.
hamachi do not have enough memory to load all tabs. low memory killer kills a background tab. The tab icon was changed to the crashed state icon. Continue the STR until b2g crash.
Comment 30•11 years ago
|
||
Before the crash happens, child side print no error. From this, I suspected low memory killer.
Comment 31•11 years ago
|
||
low memory killer killed a lot of processes.
Updated•11 years ago
|
Attachment #806021 -
Attachment is patch: false
Comment 32•11 years ago
|
||
By applying the patches, the crash did not happen in the STR in Comment 29 until now.
Assignee | ||
Comment 33•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #32)
> By applying the patches, the crash did not happen in the STR in Comment 29
> until now.
Many thanks for testing!
Comment 34•11 years ago
|
||
Bug reproduces on the latest build, but was not able to reproduce it on the previous builds with steps in comment 29
Cannot get the regression window since it's doesn't reproduce frequently
Comment 35•11 years ago
|
||
(In reply to sarsenyev from comment #34)
> Bug reproduces on the latest build, but was not able to reproduce it on the
> previous builds with steps in comment 29
> Cannot get the regression window since it's doesn't reproduce frequently
I think we've got a path forward here now, so let's disregard the regression window at this point.
Keywords: regressionwindow-wanted
Comment 36•11 years ago
|
||
(In reply to sarsenyev from comment #34)
> Bug reproduces on the latest build, but was not able to reproduce it on the
> previous builds with steps in comment 29
> Cannot get the regression window since it's doesn't reproduce frequently
From last Friday to Monday morning, tiled rendering was enabled. On tiled rendering enabled ROM, this crash is very difficult to generate.
Reporter | ||
Comment 37•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> (In reply to sarsenyev from comment #34)
> > Bug reproduces on the latest build, but was not able to reproduce it on the
> > previous builds with steps in comment 29
> > Cannot get the regression window since it's doesn't reproduce frequently
>
> From last Friday to Monday morning, tiled rendering was enabled. On tiled
> rendering enabled ROM, this crash is very difficult to generate.
Hello Sotaro: Can you please clarify if you need QA's help in reproducing this bug? In yesterday's build my feeling was the crash was happening pretty randomly.
Assignee | ||
Comment 38•11 years ago
|
||
Landed the first and third patch. The second one isn't needed, won't land.
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/f71237f92e82
remote: https://hg.mozilla.org/integration/b2g-inbound/rev/60a5068b18b8
Target Milestone: --- → mozilla27
Comment 39•11 years ago
|
||
(In reply to Marcia Knous [:marcia] from comment #37)
>
> Hello Sotaro: Can you please clarify if you need QA's help in reproducing
> this bug? In yesterday's build my feeling was the crash was happening pretty
> randomly.
When I testing, I used following URL. I open 3 tabs using the following URL. And then choose a background tab to foreground. And continue forever until the crash happens.
http://www.theregister.co.uk/2013/09/17/yahoo_groups_complaints_continue/
On my hamachi, I can consistently regenerate the crash until now.
Comment 40•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> (In reply to Marcia Knous [:marcia] from comment #37)
> When I testing, I used following URL. I open 3 tabs using the following URL.
> And then choose a background tab to foreground. And continue forever until
> the crash happens.
> http://www.theregister.co.uk/2013/09/17/yahoo_groups_complaints_continue/
When a content of the URL is shown, I always select another tab that is already dead icon.
Comment 41•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f71237f92e82
https://hg.mozilla.org/mozilla-central/rev/60a5068b18b8
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 42•11 years ago
|
||
Attempts to reproduce STR from comment 0 were not successful and appears resolved. Tried various wallpapers (10) and could not reproduce intermittent behavior.
In followup to comment 39 - tested the URL from comment 39 http://www.theregister.co.uk/2013/09/17/yahoo_groups_complaints_continue/ in 3 separate tabs in Browser.
When each tab was opened, web page displayed properly. However, when user goes into tab view within Browser, the icon for two of the tabs appear as a frowned folder with the website description to the right of image.
Environmental Variables
Build ID: 20130917180029
Gecko: http://hg.mozilla.org/mozilla-central/rev/53a2011a01b2
Gaia: 678c1d23150b0b9b037de9b2122fcccd75bdb6f5
Platform Version: 27.0a1
Buri Base 20130917
Comment 43•11 years ago
|
||
Attempts to reproduce STR from comment 0 were not successful. Tried various wallpapers (10) and could not reproduce intermittent behavior.
In followup to comment 39 - tested the URL from comment 39 http://www.theregister.co.uk/2013/09/17/yahoo_groups_complaints_continue/ in 3 separate tabs in Browser.
When each tab was opened, web page displayed properly. However, when user goes into tab view within Browser, the icon for two of the tabs appear as a frowned folder with the website description to the right of image.
Environmental Variables
Build ID: 20130917180029
Gecko: http://hg.mozilla.org/mozilla-central/rev/53a2011a01b2
Gaia: 678c1d23150b0b9b037de9b2122fcccd75bdb6f5
Platform Version: 27.0a1
Buri Base 20130917
Comment 45•11 years ago
|
||
Keywords: intermittent-failure
Comment 46•11 years ago
|
||
bjacob, looks like this is still happening. Whats the plan here?
Flags: needinfo?(bjacob)
Assignee | ||
Comment 47•11 years ago
|
||
Comment 45 is on Mozilla-Aurora. That tree doesn't have that patch. So the plan to fix that is to request aurora approval. I sync'd with release management on Tuesday, see "Where are we on merge day?" thread on release@, and my understanding was that it had landed early enough to be picked up for B2G v1.2.
Flags: needinfo?(bjacob)
Assignee | ||
Comment 48•11 years ago
|
||
Comment on attachment 804598 [details] [diff] [review]
Let the GrallocBufferActor inform its old TextureHost when it's switching to another TextureHost
There are two patches to be backported here. This one, and the 3rd one. This one is not by itself the 'right' fix, but it is what the other one applies on top of.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): regressed between gecko 18 and gecko 26
User impact if declined: B2G crashes often
Testing completed (on m-c, etc.): m-c for 1.5 days
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #804598 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 49•11 years ago
|
||
Comment on attachment 805997 [details] [diff] [review]
Let the GrallocBufferActor keep an _array_ of backrefs to TextureHosts referencing it
This one is the 'right' fix. It fixes the crashes according to QA (see above comments) and doesn't leak contrary to the earlier approaches tried on this bug (it's green on TBPL while the other approaches leaked too much to complete reftest runs successfully). So it is IMO beneficial and low-risk enough for Aurora approval.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): some time between gecko 18 and 26
User impact if declined: B2G crashes often (or, if one only takes the first patch and not this one, B2G leaks)
Testing completed (on m-c, etc.): m-c for 1.5 days
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none
Attachment #805997 -
Flags: approval-mozilla-aurora?
Reporter | ||
Comment 50•11 years ago
|
||
I saw this crash last evening and filed Bug 918178, I can dupe it back to this bug since you are going to put the fix on that branch,
Comment 51•11 years ago
|
||
This is koi+, so the patches don't need explicit approval to land.
https://hg.mozilla.org/releases/mozilla-aurora/rev/9a3414b0640c
https://hg.mozilla.org/releases/mozilla-aurora/rev/19d3cfc6cb92
status-b2g-v1.2:
--- → fixed
status-firefox25:
--- → wontfix
status-firefox26:
--- → fixed
status-firefox27:
--- → fixed
Updated•11 years ago
|
Attachment #804598 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
Attachment #805997 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
status-firefox-esr24:
--- → unaffected
I am still seeing this affect aurora; moz-central has an issue where the wallpaper is not seen and the graphic is converted to text, with no image seen (bug 919452), history is not seen (bug 918970)
Should I open a new bug? or reopen this bug?
Assignee | ||
Comment 53•11 years ago
|
||
Can you post a stack or crash link for the crash on aurora?
The moz-central symptoms you're describing don't sound related to this bug, unless you know something that I don't :-)
I think I made a mistake when I thought I flashed my build on top. I might have not done so as the crash reports in socorro doesn't reflect the 9/22 build I used.
I'm reinvestigating and I don't see the Wallpaper option in aurora. If I come across this crash I'll report it as a new bug.
Comment 55•11 years ago
|
||
Verified fixed on v1.2 and current master,
- the crash doesn't happen when longpress homescreen and select a wallpaper (comment 0)
- the crash doesn't happen when open 3-4 browser tabs and keep refreshing (comment 39)
Build ID: 20130924004002
Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/b34384409be6
Gaia: a13c76f8d3c617ee57c302c103da04ed1a6298d1
Platform Version: 26.0a2
Build ID: 20130924040201
Gecko: http://hg.mozilla.org/mozilla-central/rev/1fda74e33e06
Gaia: a22ba4a3a9efd99f94adf9ece8a2b391d4df295b
Platform Version: 27.0a1
You need to log in
before you can comment on or make changes to this bug.
Description
•