Closed
Bug 914823
Opened 11 years ago
Closed 11 years ago
crash in mozilla::gl::SharedSurface_Gralloc::~SharedSurface_Gralloc
Categories
(Core :: Graphics, defect, P1)
Tracking
()
People
(Reporter: nhirata, Assigned: bjacob)
References
(Depends on 1 open bug)
Details
(4 keywords, Whiteboard: [b2g-crash] QARegressExclude)
Crash Data
Attachments
(8 files, 4 obsolete files)
1.04 KB,
patch
|
jgilbert
:
review+
|
Details | Diff | Splinter Review |
4.56 KB,
patch
|
jgilbert
:
review+
nical
:
review+
sotaro
:
feedback+
|
Details | Diff | Splinter Review |
76.42 KB,
text/plain
|
Details | |
5.12 KB,
patch
|
Details | Diff | Splinter Review | |
73.91 KB,
text/plain
|
Details | |
742 bytes,
text/plain
|
Details | |
2.74 KB,
patch
|
Details | Diff | Splinter Review | |
1.71 KB,
patch
|
sotaro
:
review+
sotaro
:
feedback+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-4e6f43df-cc97-436e-a74b-486172130910.
=============================================================
Crashing Thread
Frame Module Signature Source
0 libxul.so mozilla::gl::SharedSurface_Gralloc::~SharedSurface_Gralloc gfx/gl/SharedSurfaceGralloc.cpp
1 libxul.so mozilla::gl::SharedSurface_Gralloc::~SharedSurface_Gralloc gfx/gl/SharedSurfaceGralloc.cpp
2 libxul.so mozilla::gfx::SurfaceStream::Delete(mozilla::gfx::SharedSurface*&) gfx/gl/SurfaceStream.cpp
3 libxul.so mozilla::gfx::SurfaceStream::~SurfaceStream gfx/gl/SurfaceStream.cpp
4 libxul.so mozilla::gfx::SurfaceStream_SingleBuffer::~SurfaceStream_SingleBuffer gfx/gl/SurfaceStream.cpp
5 libxul.so mozilla::gfx::SurfaceStream_SingleBuffer::~SurfaceStream_SingleBuffer gfx/gl/SurfaceStream.cpp
6 libxul.so mozilla::gl::GLScreenBuffer::~GLScreenBuffer gfx/gl/GLScreenBuffer.cpp
7 libxul.so mozilla::gl::GLScreenBuffer::~GLScreenBuffer gfx/gl/GLScreenBuffer.cpp
8 libxul.so mozilla::gl::GLContext::DestroyScreenBuffer() gfx/gl/GLContext.cpp
9 libxul.so mozilla::gl::GLContext::MarkDestroyed() gfx/gl/GLContext.cpp
10 libxul.so mozilla::gl::GLContextEGL::~GLContextEGL gfx/gl/GLContextProviderEGL.cpp
11 libxul.so mozilla::gl::GLContextEGL::~GLContextEGL gfx/gl/GLContextProviderEGL.cpp
12 libxul.so mozilla::detail::GenericRefCounted<(mozilla::detail::RefCountAtomicity)0>::Release() /builds/slave/b2g_m-cen_hamachi_ntly-0000000/build/objdir-gecko/gfx/gl/../../dist/include/mozilla/GenericRefCounted.h
13 libxul.so mozilla::gfx::DrawTargetSkia::~DrawTargetSkia /builds/slave/b2g_m-cen_hamachi_ntly-0000000/build/objdir-gecko/gfx/2d/../../dist/include/mozilla/RefPtr.h
14 libxul.so mozilla::gfx::DrawTargetSkia::~DrawTargetSkia gfx/2d/DrawTargetSkia.cpp
15 libxul.so mozilla::detail::RefCounted<mozilla::gfx::DrawTarget, (mozilla::detail::RefCountAtomicity)1>::Release() const /builds/slave/b2g_m-cen_hamachi_ntly-0000000/build/objdir-gecko/image/src/../../dist/include/mozilla/RefPtr.h
16 libxul.so mozilla::RefPtr<mozilla::gfx::DrawTarget>::operator=(mozilla::gfx::DrawTarget*) /builds/slave/b2g_m-cen_hamachi_ntly-0000000/build/objdir-gecko/content/canvas/src/../../../dist/include/mozilla/RefPtr.h
17 libxul.so mozilla::dom::CanvasRenderingContext2D::Reset() content/canvas/src/CanvasRenderingContext2D.cpp
18 libxul.so mozilla::dom::CanvasRenderingContext2D::~CanvasRenderingContext2D content/canvas/src/CanvasRenderingContext2D.cpp
19 libxul.so mozilla::dom::CanvasRenderingContext2D::~CanvasRenderingContext2D content/canvas/src/CanvasRenderingContext2D.cpp
20 libxul.so mozilla::dom::CanvasRenderingContext2D::DeleteCycleCollectable() content/canvas/src/CanvasRenderingContext2D.cpp
21 libxul.so mozilla::dom::CanvasRenderingContext2D::cycleCollection::DeleteCycleCollectable(void*) content/canvas/src/CanvasRenderingContext2D.h
22 libxul.so SnowWhiteKiller::~SnowWhiteKiller xpcom/base/nsCycleCollector.cpp
23 libxul.so nsCycleCollector::FreeSnowWhite(bool) xpcom/base/nsCycleCollector.cpp
24 libxul.so nsCycleCollector_doDeferredDeletion() xpcom/base/nsCycleCollector.cpp
25 libxul.so AsyncFreeSnowWhite::Run() js/xpconnect/src/XPCJSRuntime.cpp
26 libxul.so nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
27 libxul.so NS_ProcessNextEvent(nsIThread*, bool) /builds/slave/b2g_m-cen_hamachi_ntly-0000000/build/objdir-gecko/xpcom/build/nsThreadUtils.cpp
28 libxul.so mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
29 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
30 libxul.so MessageLoop::RunInternal() ipc/chromium/src/base/message_loop.cc
31 libxul.so MessageLoop::Run() ipc/chromium/src/base/message_loop.cc
32 libxul.so nsBaseAppShell::Run() widget/xpwidgets/nsBaseAppShell.cpp
33 libxul.so XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp
34 libxul.so mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
35 libxul.so MessageLoop::RunInternal() ipc/chromium/src/base/message_loop.cc
36 libxul.so MessageLoop::Run() ipc/chromium/src/base/message_loop.cc
37 libxul.so XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp
38 plugin-container main ipc/app/MozillaRuntimeMain.cpp
39 libc.so __libc_init bionic/libc/bionic/libc_init_dynamic.c
40 @0xb00045a9
On ALCATEL ONE TOUCH FIRE
Build ID 20130910040201
Release Channel hamachi/1.2.0/nightly
More reports :
https://crash-stats.mozilla.com/report/list?product=B2G&signature=mozilla%3A%3Agl%3A%3ASharedSurface_Gralloc%3A%3A~SharedSurface_Gralloc
Comment 1•11 years ago
|
||
Getting this on Unagi/1.2 aswell
Gecko http://hg.mozilla.org/mozilla-central/rev/740094c07328
Gaia 753bed59566ad14c5e032e45d2b320ef9529ca9a
BuildID 20130909195843
Version 26.0a1
The replication is in our test automation that makes a phone call from the contacts app (Contact details page) but I cannot replicate it manually.
Setting the Firefox OS flags on this.
Comment 2•11 years ago
|
||
(In reply to Zac C (:zac) from comment #1)
> Getting this on Unagi/1.2 aswell
>
> Gecko http://hg.mozilla.org/mozilla-central/rev/740094c07328
> Gaia 753bed59566ad14c5e032e45d2b320ef9529ca9a
> BuildID 20130909195843
> Version 26.0a1
>
> The replication is in our test automation that makes a phone call from the
> contacts app (Contact details page) but I cannot replicate it manually.
>
> Setting the Firefox OS flags on this.
Is this test that's generating this crash consistently failing? Or only intermittently?
Also - Can you provide a link to the test that's failing?
Comment 3•11 years ago
|
||
I only ran it twice and got 100% replication but I can see it on our Jenkins instances too.
Link to the test:
https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/tests/contacts/test_call_contact.py
Comment 4•11 years ago
|
||
Okay - sounds like this is reproducible then via the test in question.
Can we get a regression range of when this crash started happening?
Updated•11 years ago
|
blocking-b2g: --- → koi?
Comment 5•11 years ago
|
||
The regression range is simply:
Starting from the build in comment #1 (10th Sept)
Updated•11 years ago
|
Keywords: regressionwindow-wanted
Comment 6•11 years ago
|
||
(In reply to Zac C (:zac) from comment #5)
> The regression range is simply:
> Starting from the build in comment #1 (10th Sept)
Any chance to narrow it down to a changeset?
Keywords: regressionwindow-wanted
Updated•11 years ago
|
No longer blocks: b2g-central-dogfood
Updated•11 years ago
|
Blocks: b2g-central-dogfood
Comment 7•11 years ago
|
||
(In reply to Zac C (:zac) from comment #3)
> I only ran it twice and got 100% replication but I can see it on our Jenkins
> instances too.
>
>
> Link to the test:
> https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/tests/contacts/
> test_call_contact.py
How do I run the testing? I couldn't reproduce it manually.
Comment 8•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #6)
> (In reply to Zac C (:zac) from comment #5)
> > The regression range is simply:
> > Starting from the build in comment #1 (10th Sept)
>
> Any chance to narrow it down to a changeset?
Not via automation easily. We have builds daily from release engineering, but not down to a changeset. We would need to do this via building manually and running the automation directly - which is going to take quite some time resources-wise. It's going to be easier to grab a push log and infer a potential regressing changeset.
(In reply to peter chang[:pchang] from comment #7)
> (In reply to Zac C (:zac) from comment #3)
> > I only ran it twice and got 100% replication but I can see it on our Jenkins
> > instances too.
> >
> >
> > Link to the test:
> > https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/tests/contacts/
> > test_call_contact.py
>
> How do I run the testing? I couldn't reproduce it manually.
See https://github.com/mozilla/gaia-ui-tests/blob/master/README.md.
Updated•11 years ago
|
Whiteboard: [b2g-crash]
Comment 9•11 years ago
|
||
Checking the crash stack here, I can't see any obvious changeset in the push log that caused this crash. I'm cc-ing Vlad & Jeff here for input, as it looks like they interact with a lot of source code files seen in this stack.
Comment 10•11 years ago
|
||
Looks like this is a top 5 crash on 1.2 so we should investigate.
Assignee: nobody → milan
blocking-b2g: koi? → koi+
Comment 11•11 years ago
|
||
Any crash reports since 9/18?
Comment 12•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #11)
> Any crash reports since 9/18?
Yes. https://crash-stats.mozilla.com/report/index/19fbf3bd-0a9c-4ef9-977b-348d02130919.
Assignee | ||
Comment 13•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #12)
> (In reply to Milan Sreckovic [:milan] from comment #11)
> > Any crash reports since 9/18?
>
> Yes.
> https://crash-stats.mozilla.com/report/index/19fbf3bd-0a9c-4ef9-977b-
> 348d02130919.
That one says: Build ID 20130916040205
---> it's a build from 9/16.
Comment 14•11 years ago
|
||
Right - I meant to ask if there are any crashes on builds from 9/18 and later.
Comment 15•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #14)
> Right - I meant to ask if there are any crashes on builds from 9/18 and
> later.
Oh. Latest build ID shown in crash stats is 20130918152049. If we need to retest, we can just see if the Gaia UI Test from https://bugzilla.mozilla.org/show_bug.cgi?id=914823#c3 is still hitting this crash.
Comment 16•11 years ago
|
||
Assign to Benoit while we're sorting out if it's still there the next few days.
Assignee: milan → bjacob
Comment 17•11 years ago
|
||
Zac - Are you still seeing this crash in Gaia UI Automation with https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/tests/contacts/test_call_contact.py?
Flags: needinfo?(zcampbell)
Reporter | ||
Comment 18•11 years ago
|
||
https://crash-stats.mozilla.com/report/index/2bcb59ad-d67c-401b-95cc-374512130919
https://crash-stats.mozilla.com/report/index/d84c6aed-10cf-4b5e-b2fe-305602130919
along with :
https://crash-stats.mozilla.com/report/index/adf9da9c-e15d-4fc0-ad4f-d3bbd2130919
I saw this after opening and killing the usage app on the first usage.
Gaia c932c482c6944fa32724ce7af9e5423c4c2bcccd
Gecko 2e560c5ce351
BuildID 20130919004001
Version 26.0a2
Comment 20•11 years ago
|
||
Just hit this crash in 1.3 - when it crashed it said "Twitter crashed."
https://crash-stats.mozilla.com/report/index/3ae6f263-5677-4cb8-a585-37dde2130920
Gaia 78a3e2509b979bd226f01f557bdf229fd3faa42e
SourceStamp 59beb1868522
BuildID 20130920040201
Version 27.0a1
Comment 21•11 years ago
|
||
Adding the Skia folks as well.
Assignee | ||
Comment 22•11 years ago
|
||
Alright, sounds like this is still an issue, will look into this today.
Assignee | ||
Comment 23•11 years ago
|
||
I tried running the gaia-ui-tests following these instructions:
https://github.com/mozilla/gaia-ui-tests/blob/master/README.md
but am running into a lot of trouble. I get
$ gaiatest gaiatest/tests/contacts/test_call_contact.py
Traceback (most recent call last):
File "/usr/local/bin/gaiatest", line 5, in <module>
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2707, in <module>
working_set.require(__requires__)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 686, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 586, in resolve
if dist not in req:
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2560, in __contains__
if self.index: item = item.parsed_version # only get if we need it
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2169, in parsed_version
self._parsed_version = pv = parse_version(self.version)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2185, in version
"Missing 'Version:' header and/or PKG-INFO file", self
ValueError: ("Missing 'Version:' header and/or PKG-INFO file", marionette-client [unknown version] (/hack/mozilla-graphics/testing/marionette/client))
That's strange as /hack/mozilla-graphics is an old source directory that I haven't used in a while, and isn't the one that I built B2G from, so I don't understand where it's coming from. Also, I don't understand this error. And the gaia-ui-tests repo's wiki is saying that this project was merged into Gaia, but it's not very clear to me in the Gaia wiki what would be the up-to-date steps to run UI tests.
Comment 24•11 years ago
|
||
David and/or Dylan, who's a good person on Gaia that BenWa can talk to in order to sort these issues out, and be able to run the tests?
Flags: needinfo?(dscravaglieri)
Flags: needinfo?(doliver)
Comment 25•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #24)
> David and/or Dylan, who's a good person on Gaia that BenWa can talk to in
> order to sort these issues out, and be able to run the tests?
This looks like a marionette setup issue potentially, so I'd ask jgriffin or mdas.
Assignee | ||
Comment 26•11 years ago
|
||
Good news: thanks to great support from mdas I got to run gaia unit tests.
Bad news: test_call_contact.py doesn't exist anymore.
What would be up-to-date STR?
Flags: needinfo?(zcampbell)
Comment 27•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #26)
> Good news: thanks to great support from mdas I got to run gaia unit tests.
Great, if you still need a Gaia contact for more context on the Comms apps, you can probably start with :etienne.
Flags: needinfo?(dscravaglieri)
Flags: needinfo?(doliver)
Assignee | ||
Comment 28•11 years ago
|
||
Alright, so, I can reproduce with some usage of the Contacts app in the B2G emulator, no need for a real device, no need for test runners --- helps a lot to get this in GDB.
Here are some very rough STR. Still not too sure; they work maybe once in 3 attempts (have to restart the emulator between attempts).
1. Boot b2g emulator
2. Start Contacts app
3. Make sure you have at least one contact with a phone number (no need for actual phone network)
4. Hit his phone number to call him
5. Attach GDB to the Contacts app, or the Phone app, not too sure --- whichever one is doing graphics work with a 2D canvas at this point. Unfortunately, adb shell b2g-ps doesn't give process names in the emulator. But in adb logcat, notice the graphics spew about SharedSurface's (may require a debug build). Attach to the process that's generating it.
6. Cancel the phone call, call again, over and over again a few times
7. If that doesn't work, try killing the phone call app while it's dialing: hold the home button pressed to start the task manager, hit the 'close window' icon that appears.
Here's what the crash looks like:
Program received signal SIGSEGV, Segmentation fault.
0x4115aa6c in ~SharedSurface_Gralloc (this=0x45719400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.cpp:135
135 mAllocator->DestroySharedSurface(&desc);
(gdb) bt
#0 0x4115aa6c in ~SharedSurface_Gralloc (this=0x45719400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.cpp:135
#1 0x4115aac8 in ~SharedSurface_Gralloc (this=0x45719400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.cpp:136
#2 0x41172b66 in mozilla::gfx::SurfaceStream::Delete (this=0x4588ba50, surf=@0x4588ba58)
at /hack/mozilla-aurora/gfx/gl/SurfaceStream.cpp:85
#3 0x41172c7c in ~SurfaceStream (this=0x4588ba50, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SurfaceStream.cpp:144
#4 0x41172f32 in ~SurfaceStream_SingleBuffer (this=0x4588ba50, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SurfaceStream.cpp:206
#5 0x41172f60 in ~SurfaceStream_SingleBuffer (this=0x4588ba50, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SurfaceStream.cpp:206
#6 0x41169c52 in ~GLScreenBuffer (this=0x4572b400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLScreenBuffer.cpp:70
#7 0x41169cb8 in ~GLScreenBuffer (this=0x4572b400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLScreenBuffer.cpp:73
#8 0x411663b8 in mozilla::gl::GLContext::DestroyScreenBuffer (this=0x45756800)
at /hack/mozilla-aurora/gfx/gl/GLContext.cpp:3414
#9 0x41163c98 in mozilla::gl::GLContext::MarkDestroyed (this=0x45756800)
at /hack/mozilla-aurora/gfx/gl/GLContext.cpp:1793
#10 0x4115b7b8 in ~GLContextEGL (this=0x45756800, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLContextProviderEGL.cpp:301
#11 0x4115b8ac in ~GLContextEGL (this=0x45756800, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLContextProviderEGL.cpp:317
#12 0x4115e19a in mozilla::detail::GenericRefCounted<(mozilla::detail::RefCountAtomicity)0>::Release (
this=0x45756800) at ../../dist/include/mozilla/GenericRefCounted.h:66
#13 0x412eef86 in mozilla::RefPtr<mozilla::GenericRefCountedBase>::unref (this=0x45823b20,
__in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:203
#14 ~RefPtr (this=0x45823b20, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:153
#15 ~DrawTargetSkia (this=0x45823b20, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/2d/DrawTargetSkia.cpp:146
#16 0x412eefac in ~DrawTargetSkia (this=0xbef3b2b8, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/2d/DrawTargetSkia.cpp:146
#17 0x412ed490 in mozilla::detail::RefCounted<mozilla::gfx::DrawTarget, (mozilla::detail::RefCountAtomicity)1>::Release (this=0x4430a5a0, t=0x0) at ../../dist/include/mozilla/RefPtr.h:82
#18 mozilla::RefPtr<mozilla::gfx::DrawTargetSkia>::unref (this=0x4430a5a0, t=0x0)
at ../../dist/include/mozilla/RefPtr.h:203
#19 mozilla::RefPtr<mozilla::gfx::DrawTargetSkia>::assign (this=0x4430a5a0, t=0x0)
at ../../dist/include/mozilla/RefPtr.h:189
#20 mozilla::RefPtr<mozilla::gfx::DrawTargetSkia>::operator= (this=0x4430a5a0, t=0x0)
at ../../dist/include/mozilla/RefPtr.h:164
#21 0x412ed6a2 in ~SourceSurfaceSkia (this=0x4430a560, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/2d/SourceSurfaceSkia.cpp:27
#22 0x412ed6c4 in ~SourceSurfaceSkia (this=0xbef3b2b8, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/2d/SourceSurfaceSkia.cpp:29
#23 0x408fb124 in mozilla::detail::RefCounted<mozilla::gfx::SourceSurface, (mozilla::detail::RefCountAtomicity)1>::Release (this=<value optimized out>) at ../../../dist/include/mozilla/RefPtr.h:82
#24 0x408fe542 in mozilla::RefPtr<mozilla::gfx::SourceSurface>::unref (this=0x4502dc00)
at ../../../dist/include/mozilla/RefPtr.h:203
---Type <return> to continue, or q <return> to quit---
#25 ~RefPtr (this=0x4502dc00) at ../../../dist/include/mozilla/RefPtr.h:153
#26 mozilla::dom::CanvasRenderingContext2D::Demote (this=0x4502dc00)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:764
#27 0x408fe9ac in mozilla::dom::CanvasRenderingContext2D::DemoteOldestContextIfNecessary ()
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:791
#28 0x408fd230 in mozilla::dom::CanvasRenderingContext2D::EnsureTarget (this=0x45031000)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:857
#29 0x408fdbb8 in mozilla::dom::CanvasRenderingContext2D::TransformWillUpdate (this=0xbef3b2b8)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:1931
#30 0x408fde0a in mozilla::dom::CanvasRenderingContext2D::Scale (this=0xbef3b2b8, x=-nan(0xa9b4000000000),
y=0.62727272727272732, error=...)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:1149
#31 0x40e913de in scale (cx=0x40238320, obj=<value optimized out>, self=0x45031000, args=...)
at /hack/b2g/B2G/objdir-gecko/dom/bindings/CanvasRenderingContext2DBinding.cpp:778
#32 0x40e8fbfc in genericMethod (cx=0x40238320, argc=<value optimized out>, vp=<value optimized out>)
at /hack/b2g/B2G/objdir-gecko/dom/bindings/CanvasRenderingContext2DBinding.cpp:4177
#33 0x414bdd32 in CallJSNative (cx=0x40238320, args=..., construct=<value optimized out>)
at /hack/mozilla-aurora/js/src/jscntxtinlines.h:218
#34 Invoke (cx=0x40238320, args=..., construct=<value optimized out>)
at /hack/mozilla-aurora/js/src/vm/Interpreter.cpp:478
Where are we right now?
(gdb) frame 0
#0 0x4115aa6c in ~SharedSurface_Gralloc (this=0x45719400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.cpp:135
135 mAllocator->DestroySharedSurface(&desc);
(gdb) l
130
131 mGL->MakeCurrent();
132 mGL->fDeleteTextures(1, (GLuint*)&mProdTex);
133
134 SurfaceDescriptor desc(mDesc);
135 mAllocator->DestroySharedSurface(&desc);
136 }
137
138 void
139 SharedSurface_Gralloc::Fence()
(gdb) p mAllocator
$21 = (class mozilla::layers::ISurfaceAllocator * const) 0x43eb8d98
(gdb) p &desc
$22 = (mozilla::layers::SurfaceDescriptor *) 0xbef3b2b8
Note how mAllocator is reported as a pointer of the pure virtual base type, ISurfaceAllocator. That's sort of suspiscious --- suggests a corrupted (maybe deleted, maybe just corrupted) mAllocator, which would explain the immediate crash in this stack frame.
(gdb) p/x *(uint32_t*)mAllocator@10
$25 = {0xfffa9b28, 0x93, 0x0, 0xf8260000, 0xefe, 0x0, 0xd7000000, 0x32ff, 0x127b05, 0xf9190000}
The first 4 bytes pointed to by mAllocator, which have the value 0xfffa9b28, should be the vptr, and indeed,
(gdb) p *mAllocator
$27 = {
_vptr.ISurfaceAllocator = 0xfffa9b28
}
But 0xfffa9b28 is not a mapped address (not in /proc/201/maps). So, *mAllocator is definitely corrupted.
By looking at registers and disassembly, we can confirm that this is what is causing the crash:
(gdb) info registers
r0 0xbef3b2b8 -1091325256
r1 0x0 0
r2 0x0 0
r3 0xfffa9b40 -353472
r4 0x45823b20 1166162720
r5 0x0 0
r6 0x4430a560 1144038752
r7 0x0 0
r8 0x8c 140
r9 0x0 0
r10 0x4502dc74 1157815412
r11 0xbef3b9b8 -1091323464
r12 0x40000000 1073741824
sp 0xbef3b2b0 0xbef3b2b0
lr 0x40e2845d 1088586845
pc 0x4115aa6c 0x4115aa6c <~SharedSurface_Gralloc+92>
cpsr 0x30 48
Disassembly:
0x4115aa3c <+44>: ldr r3, [sp, #4]
0x4115aa3e <+46>: ldr r2, [r3, #28]
0x4115aa40 <+48>: ldr r3, [sp, #4]
0x4115aa42 <+50>: add.w r3, r3, #60 ; 0x3c
0x4115aa46 <+54>: mov r0, r2
0x4115aa48 <+56>: mov.w r1, #1
0x4115aa4c <+60>: mov r2, r3
0x4115aa4e <+62>: bl 0x410efff8 <_ZN7mozilla2gl9GLContext15fDeleteTexturesEiPj>
0x4115aa52 <+66>: ldr r3, [sp, #4]
0x4115aa54 <+68>: add.w r3, r3, #40 ; 0x28
0x4115aa58 <+72>: add r2, sp, #8
0x4115aa5a <+74>: mov r0, r2
0x4115aa5c <+76>: mov r1, r3
0x4115aa5e <+78>: bl 0x40e28454 <SurfaceDescriptor>
0x4115aa62 <+82>: ldr r3, [sp, #4]
0x4115aa64 <+84>: ldr r3, [r3, #36] ; 0x24
0x4115aa66 <+86>: ldr r3, [r3, #0]
0x4115aa68 <+88>: add.w r3, r3, #24
=> 0x4115aa6c <+92>: ldr r3, [r3, #0]
At 0x4115aa62 we are loading the 'this' pointer, stored at sp+4, into r3:
(gdb) p this
$7 = (mozilla::gl::SharedSurface_Gralloc *) 0x45719400
(gdb) p/x *(uint32_t*)0xbef3b2b4
$10 = 0x45719400
At 0x4115aa64 we are reading a value stored 36 bytes into *this, storing it into r3:
(gdb) p/x *(uint32_t*)(0x45719400 + 36)
$13 = 0x43eb8d98
(gdb) p mAllocator
$15 = (class mozilla::layers::ISurfaceAllocator * const) 0x43eb8d98
So the value we just stored in r3 is mAllocator.
At 0x4115aa66 we are trying to dereference the vptr, storing the result again into r3.
The value that we are reading is:
(gdb) p/x *(uint32_t*)mAllocator
$24 = 0xfffa9b28
That's where things go wrong: we just read a bad vptr pointer, 0xfffa9b28.
Apparently we want to call the virtual method at offset 24 into the vtable:
At 0x4115aa68 we are adding 24 to r3, so we get 0xfffa9b40, the value of r3 at the time of the crash. Clearly, that method is ISurfaceAllocator::DestroySharedSurface.
The crash occurs at 0x4115aa6c as we try to dereference that to call DestroySharedSurface.
The question is why does the ISurfaceAllocator get corrupted? Does it get deleted too early, or does it get stomped over?
Flags: needinfo?(zcampbell)
Assignee | ||
Comment 29•11 years ago
|
||
Got a stack to where we are destroying that ISurfaceAllocator, which turns out to be a ClientLayerManager!
(gdb) bt
#0 0x411f1b32 in ~ClientLayerManager (this=0x4405e500, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/layers/client/ClientLayerManager.cpp:55
#1 0x411f1bb0 in ~ClientLayerManager (this=0x4405e500, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/layers/client/ClientLayerManager.cpp:60
#2 0x408401e4 in mozilla::layers::LayerManager::Release (this=0x1f) at ../../dist/include/Layers.h:163
#3 0x41259760 in ~nsRefPtr (this=0x446e8378, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:887
#4 0x41259846 in ~SurfaceFactory_Gralloc (this=0x446e8300, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.h:96
#5 0x4125987c in ~SurfaceFactory_Gralloc (this=0x446e8300, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/SharedSurfaceGralloc.h:96
#6 0x41268786 in ~GLScreenBuffer (this=0x43a47b00, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLScreenBuffer.cpp:69
#7 0x41268808 in ~GLScreenBuffer (this=0x43a47b00, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLScreenBuffer.cpp:73
#8 0x41264f08 in mozilla::gl::GLContext::DestroyScreenBuffer (this=0x44931400) at /hack/mozilla-aurora/gfx/gl/GLContext.cpp:3414
#9 0x412627f8 in mozilla::gl::GLContext::MarkDestroyed (this=0x44931400) at /hack/mozilla-aurora/gfx/gl/GLContext.cpp:1793
#10 0x4125a31c in ~GLContextEGL (this=0x44931400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLContextProviderEGL.cpp:301
#11 0x4125a410 in ~GLContextEGL (this=0x44931400, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/gfx/gl/GLContextProviderEGL.cpp:317
#12 0x4125ccfa in mozilla::detail::GenericRefCounted<(mozilla::detail::RefCountAtomicity)0>::Release (this=0x44931400)
at ../../dist/include/mozilla/GenericRefCounted.h:66
#13 0x413ec592 in mozilla::RefPtr<mozilla::GenericRefCountedBase>::unref (this=0x449ec820, __in_chrg=<value optimized out>)
at ../../dist/include/mozilla/RefPtr.h:203
#14 ~RefPtr (this=0x449ec820, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:153
#15 ~DrawTargetSkia (this=0x449ec820, __in_chrg=<value optimized out>) at /hack/mozilla-aurora/gfx/2d/DrawTargetSkia.cpp:146
#16 0x413ec5b8 in ~DrawTargetSkia (this=0x1f, __in_chrg=<value optimized out>) at /hack/mozilla-aurora/gfx/2d/DrawTargetSkia.cpp:146
#17 0x4081657c in mozilla::detail::RefCounted<mozilla::gfx::DrawTarget, (mozilla::detail::RefCountAtomicity)1>::Release (
this=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:82
#18 0x409fa7b8 in mozilla::RefPtr<mozilla::gfx::DrawTarget>::unref (this=0x453e5c30, t=0x41b57b58)
at ../../../dist/include/mozilla/RefPtr.h:203
#19 mozilla::RefPtr<mozilla::gfx::DrawTarget>::assign (this=0x453e5c30, t=0x41b57b58) at ../../../dist/include/mozilla/RefPtr.h:189
#20 mozilla::RefPtr<mozilla::gfx::DrawTarget>::operator= (this=0x453e5c30, t=0x41b57b58) at ../../../dist/include/mozilla/RefPtr.h:164
#21 0x409fc848 in mozilla::dom::CanvasRenderingContext2D::Reset (this=0x453e5c00)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:616
---Type <return> to continue, or q <return> to quit---
#22 0x409ffcba in ~CanvasRenderingContext2D (this=0x1f, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:546
#23 0x409ffd70 in ~CanvasRenderingContext2D (this=0x1f, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:559
#24 0x409fa4ee in mozilla::dom::CanvasRenderingContext2D::DeleteCycleCollectable (this=0x1f)
at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.cpp:463
#25 0x409fa392 in mozilla::dom::CanvasRenderingContext2D::cycleCollection::DeleteCycleCollectable (this=<value optimized out>,
p=0x41b57b58) at /hack/mozilla-aurora/content/canvas/src/CanvasRenderingContext2D.h:407
#26 0x4118826e in ~SnowWhiteKiller (this=0xbed86ef4, __in_chrg=<value optimized out>)
at /hack/mozilla-aurora/xpcom/base/nsCycleCollector.cpp:1993
#27 0x41188eba in nsCycleCollector::FreeSnowWhite (this=0x40375000, aUntilNoSWInPurpleBuffer=<value optimized out>)
at /hack/mozilla-aurora/xpcom/base/nsCycleCollector.cpp:2102
#28 0x41188ef0 in nsCycleCollector_doDeferredDeletion () at /hack/mozilla-aurora/xpcom/base/nsCycleCollector.cpp:3153
#29 0x40d3e91e in AsyncFreeSnowWhite::Run (this=0x403de040) at /hack/mozilla-aurora/js/xpconnect/src/XPCJSRuntime.cpp:231
#30 0x4118238c in nsThread::ProcessNextEvent (this=0x40316940, mayWait=<value optimized out>, result=0xbed86f97)
at /hack/mozilla-aurora/xpcom/threads/nsThread.cpp:622
#31 0x4115fcc4 in NS_ProcessNextEvent (thread=0xc298144, mayWait=false) at /hack/mozilla-aurora/xpcom/glue/nsThreadUtils.cpp:238
#32 0x40ea1208 in mozilla::ipc::MessagePump::Run (this=0x40301be0, aDelegate=0xbed878a4)
at /hack/mozilla-aurora/ipc/glue/MessagePump.cpp:81
#33 0x40ea12ba in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40301be0, aDelegate=0xbed878a4)
at /hack/mozilla-aurora/ipc/glue/MessagePump.cpp:234
#34 0x4119e968 in MessageLoop::RunInternal (this=0x1000000) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:220
#35 0x4119e9e6 in MessageLoop::RunHandler (this=0xbed878a4) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:213
#36 MessageLoop::Run (this=0xbed878a4) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:187
#37 0x40e50af0 in nsBaseAppShell::Run (this=0x43595d60) at /hack/mozilla-aurora/widget/xpwidgets/nsBaseAppShell.cpp:161
#38 0x406c6576 in XRE_RunAppShell () at /hack/mozilla-aurora/toolkit/xre/nsEmbedFunctions.cpp:679
#39 0x40ea1288 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40301be0, aDelegate=0xbed878a4)
at /hack/mozilla-aurora/ipc/glue/MessagePump.cpp:201
#40 0x4119e968 in MessageLoop::RunInternal (this=0x43595d60) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:220
#41 0x4119e9e6 in MessageLoop::RunHandler (this=0xbed878a4) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:213
#42 MessageLoop::Run (this=0xbed878a4) at /hack/mozilla-aurora/ipc/chromium/src/base/message_loop.cc:187
#43 0x406c6a6a in XRE_InitChildProcess (aArgc=-1093109196, aArgv=0xbed879ac, aProcess=1077172224)
at /hack/mozilla-aurora/toolkit/xre/nsEmbedFunctions.cpp:516
#44 0x00008650 in main (argc=7, argv=0xbed87a34) at /hack/mozilla-aurora/ipc/app/MozillaRuntimeMain.cpp:85
Assignee | ||
Comment 30•11 years ago
|
||
This fixes it! Explanation in the in-code comment.
Attachment #810246 -
Flags: review?(jgilbert)
Updated•11 years ago
|
Attachment #810246 -
Flags: review?(jgilbert) → review+
Assignee | ||
Comment 31•11 years ago
|
||
Target Milestone: --- → mozilla27
Assignee | ||
Comment 32•11 years ago
|
||
Comment on attachment 810246 [details] [diff] [review]
Reorder the ScreenBuffer teardown
[Approval Request Comment]
Bug caused by (feature/regressing bug #): A combinations of big changes between gecko 18 and gecko 26: the SurfaceStream stuff from bug 716859, the Gralloc back-end for it from bug 843599, ...
User impact if declined: Crashiness in all the apps that uses canvases, including 2d canvases. That includes the Communications app.
Testing completed (on m-c, etc.): inbound
Risk to taking this patch (and alternatives if risky): very low risk, just reorder deletes. Of course deleting things in wrong order can be crashy, but the current order _is_ crashy ;-) In my local testing, this fixes the crash and doesn't introduce another one so far. Also it inherently makes more sense to destroy things in this order. So, I'm confident.
String or IDL/UUID changes made by this patch: none
Attachment #810246 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 33•11 years ago
|
||
Comment on attachment 810246 [details] [diff] [review]
Reorder the ScreenBuffer teardown
Oh this is koi+, so no need for approval. Landing.
Attachment #810246 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 34•11 years ago
|
||
Updated•11 years ago
|
Keywords: regressionwindow-wanted
Comment 35•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #33)
> Oh this is koi+, so no need for approval. Landing.
We do typically prefer that it at *least* be green on trunk first, though.
Updated•11 years ago
|
status-b2g-v1.2:
--- → fixed
status-firefox25:
--- → wontfix
status-firefox26:
--- → fixed
status-firefox27:
--- → fixed
Comment 36•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 38•11 years ago
|
||
I'm very sorry. It turns out that I got confused by my own debugging helpers. The stack that I pasted on comment 29 can't exist with actual mozilla-central or mozilla-aurora code, because the SurfaceFactory does /not/ actually hold a strong reference to mAllocator. Only with my own local debugging patch does it hold a strong reference to mAllocator.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Assignee | ||
Comment 39•11 years ago
|
||
And while I did get locally to a point where this wasn't crashing anymore (with my own debugging helpers), that does not give us a hint as to what the real fix is, because this was likely leaking a cycle of strong references like this:
CanvasLayer ----> CanvasClient -----> TextureClientShmem ----> DrawTargetSkia
^ |
| |
| V
ClientLayerManager <-- SurfaceFactory_Gralloc <-- GLScreenBuffer <-- GLContext
^
|
|
This one arrow is the strong reference
that my local debugging helpers added.
To be clear, by "my local debugging helpers", I'm not talking about stuff that landed, I'm only explaining why it had seemed to me, in my local testing, that the crash was fixed.
Back to debugging this...
Assignee | ||
Comment 40•11 years ago
|
||
I think I got it. I can't tell if it's the only problem, but it is definitely _a_ problem that seems to explain very well what we see in the stack trace to the actual crash, in comment 28.
In that stack, see how everything is triggered by a DOM call (CanvasRenderingContext2D::Scale at frame 30). This crashes as it eventually tries to do something that relies on the layers IPC system (the crash line with mAllocator->DestroySharedSurface(...), where mALlocator is a ClientLayerManager) but that was already destroyed.
So I tried to understand why the ClientLayerManager was already destroyed, but that is in fact not mysterious or interesting at all: as the DOM is managed by cycle collection, assuming that the layers IPC system is not, there is nothing that could exactly tie their lifetimes and ensure that the DOM doesn't survive the layers IPC system!
So we have two conceivable ways to fix this:
* option #1: have the DOM keep the child-side layers IPC system alive (so that the child-side layers IPC system won't go down until the cycle collector has taken down the DOM). Here 'the DOM' probably means the DOM GlobalWindow / Window / Document (I am not a DOM expert...), and 'the layers IPC system' probably means something around PLayerTransactionChild or ClientLayerManager, but I am not a layers IPC expert.
* option #2: have the layers IPC system notify the DOM when it's going down, so that the DOM knows that any resource that it referenced, that was managed by the layers IPC system, is now neutered/invalid.
Opinions? By default, I am more tempted to try out option #1 because it seems like it could be a very small patch solving an entire class of problems, whereas option #2 would require ad-hoc code for every occurence of this kind of problems. But I am not confident about the consequences of reordering the child processes shutdown sequence in this way.
Comment 41•11 years ago
|
||
I made some investigation on this issue on Unagi:
Gecko http://hg.mozilla.org/mozilla-central/rev/d71579c316c1
Gaia 67243760c86561e365bdd6def519105366e24be3
BuildID 20131001040206
Version 27.0a1
https://bugzilla.mozilla.org/show_bug.cgi?id=923033#c1
We fixed the test and now it's passing but the phone is still crashing
Comment 42•11 years ago
|
||
This crash also continues to be seen by stability testing on v1.2.
Assignee | ||
Comment 43•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #40)
So, talked with some people (:roc, :bent). Option #1 as phrased above is bad, because canvas rendering should not depend on Layers whatsoever, and because canvas elements can exist outside of the DOM tree. The closest modification of option #1 that would make sense would be to split the surface allocation protocol out of PLayerTransaction so it's not tied to Layers anymore, and have that be controlled by the PBrowser protocol. That won't guarantee exact ordering of the shutdown sequence vs. HTML canvas elements.
Which takes us to option #2: we need that anyway. That is actually very easy to implement with WeakPtr's. We have a very simple, risk-free path to fixing each of these bugs by turning raw pointers into weak pointers and checking for nullity before dereferencing them. It's tedious and unsatisfactory, but I believe that it's the right thing to do for B2G v1.2 at least.
Assignee | ||
Comment 44•11 years ago
|
||
This fixes the crash in my local testing. As it is a weak pointer, it doesn't change the lifespan of anything, so I feel confident that this time around, this is an actual fix. (Still sorry about the bad 'fix' earlier on this bug!)
Attachment #813622 -
Flags: review?(sotaro.ikeda.g)
Assignee | ||
Comment 45•11 years ago
|
||
Changed to use atomic refcounting.
Attachment #813622 -
Attachment is obsolete: true
Attachment #813622 -
Flags: review?(sotaro.ikeda.g)
Attachment #813633 -
Flags: review?(sotaro.ikeda.g)
Comment 46•11 years ago
|
||
Comment on attachment 813633 [details] [diff] [review]
Use a WeakPtr to the SurfaceAllocator
The patch seems good to me in general. But I do not well about SharedSurface_Gralloc. It is better to ask a review to a person who know about SharedSurface_Gralloc.
Attachment #813633 -
Flags: review?(sotaro.ikeda.g) → feedback+
Assignee | ||
Updated•11 years ago
|
Attachment #813633 -
Flags: review?(vladimir)
Attachment #813633 -
Flags: review?(jgilbert)
Comment 47•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #46)
> Comment on attachment 813633 [details] [diff] [review]
> Use a WeakPtr to the SurfaceAllocator
>
> The patch seems good to me in general. But I do not well about
> SharedSurface_Gralloc. It is better to ask a review to a person who know
> about SharedSurface_Gralloc.
As we discussed, this usage does not need to be atomic weak pointer. And from an usage of atomic weak pointer, attachment 813633 [details] [diff] [review] seem not correct. If we want to use atomic weak pointer, we should not hold raw pointer, otherwise we should hold strong pointer to extend the life time of the allocator.
Assignee | ||
Comment 48•11 years ago
|
||
Indeed (as discussed offline). The 'Atomic' here is purely cosmetic at the moment. Fixing this seems to require changes in MFBT. Specifically, MFBT's WeakPtr allows casting to raw pointer, but not atomically constructing a RefPtr from a WeakPtr ; for Atomic to be meaningful there, that would have to be the converse. Will file bugs about fixing that.
Meanwhile, for B2G v1.2, this is not an immediate concern as the present code (publishing canvas frames) is currently only run on the main thread. That issue will become concrete in the next few months as we enable canvas-on-workers and let it talk directly to the compositor. But as that is all what we need for B2G v1.2 (and for current mozilla-central), let's continue with this patch as-is for now.
https://tbpl.mozilla.org/?tree=Try&rev=76b6f909dd6e
Assignee | ||
Comment 49•11 years ago
|
||
Filed bug 923554 about making AtomicSupportsWeakPtr actually usable atomically.
Assignee | ||
Comment 50•11 years ago
|
||
Comment on attachment 813633 [details] [diff] [review]
Use a WeakPtr to the SurfaceAllocator
It wouldn't hurt to have :nical's review on making ISurfaceAllocator a SupportsWeakPtr.
Attachment #813633 -
Flags: review?(nical.bugzilla)
Updated•11 years ago
|
Attachment #813633 -
Flags: review?(nical.bugzilla) → review+
Updated•11 years ago
|
No longer blocks: b2g-central-dogfood
Comment 51•11 years ago
|
||
Comment on attachment 813633 [details] [diff] [review]
Use a WeakPtr to the SurfaceAllocator
Review of attachment 813633 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/gl/SharedSurfaceGralloc.cpp
@@ +131,5 @@
> mGL->MakeCurrent();
> mGL->fDeleteTextures(1, (GLuint*)&mProdTex);
>
> + if (mAllocator) {
> + SurfaceDescriptor desc(mDesc);
Let's leave this outside the branch.
@@ +132,5 @@
> mGL->fDeleteTextures(1, (GLuint*)&mProdTex);
>
> + if (mAllocator) {
> + SurfaceDescriptor desc(mDesc);
> + mAllocator->DestroySharedSurface(&desc);
Well, I hope we don't get a race slice between the check and the deref.
Attachment #813633 -
Flags: review?(jgilbert) → review+
Assignee | ||
Comment 52•11 years ago
|
||
Comment on attachment 813633 [details] [diff] [review]
Use a WeakPtr to the SurfaceAllocator
2 reviews = enough, but it's still very useful to have as many people as possible aware of these issues.
Attachment #813633 -
Flags: review?(vladimir)
Assignee | ||
Comment 53•11 years ago
|
||
Comment 54•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 55•11 years ago
|
||
Updated•11 years ago
|
Comment 56•11 years ago
|
||
Deep sadness. We still see this crash manifesting on v1.2 with the landed patch here. Should I reopen this bug or create a new one?
Flags: needinfo?(bjacob)
Assignee | ||
Comment 57•11 years ago
|
||
Do you have steps to reproduce? The steps-to-reproduce that used to allow me to reproduce this crash, do not reproduce anymore with the patch that landed (I carefully checked, the second time!).
Maybe once we'll know STR we can decide whether to reopen or file a separate bug.
Also, do you know if at least the crash occurs less frequently with the patch that landed?
Flags: needinfo?(bjacob)
Assignee | ||
Comment 58•11 years ago
|
||
...that said, one can take a guess as to what might still be crashing. The line that used to crash was
mAllocator->DestroySharedSurface(&desc);
This could crash for two reasons: either (1) the 'mAllocator' being dead, or (2) the gralloc buffer referenced by 'desc' being dead. Everytime I reproduced this, we were in case (1), so in my patch I fixed (1) and that fixed the bug for me. I assumed that for whatever reason (2) didn't seem to happen and I wanted to make minimal changes to B2G v1.2. But if we lack STR for this crash and have to make guesses, (2) is probably my best guess at this point. If that is the problem, then it's 'the usual' GrallocBufferActor crash, which we've dealt with before in other cases (e.g. bug 912725), so it's not too scary to handle it there too.
Comment 59•11 years ago
|
||
I don't have STR beyond run monkey for 8 hours overnight. We've seen the crash manifest on v1.2 builds from Oct 10, 13, 14, 15 -- but I don't have data on the frequency of the crash per build. Looking at the crash history, looks like about the same frequency before/after the patch in this bug landed.
Comment 60•11 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #59)
> I don't have STR beyond run monkey for 8 hours overnight. We've seen the
> crash manifest on v1.2 builds from Oct 10, 13, 14, 15 -- but I don't have
> data on the frequency of the crash per build. Looking at the crash
> history, looks like about the same frequency before/after the patch in this
> bug landed.
FWIW - we saw just saw this crash today in bug 927433 when launching the music app.
Comment 62•11 years ago
|
||
(this crash has manifested twice in our testing today)
Updated•11 years ago
|
Target Milestone: mozilla27 → 1.2 C3(Oct25)
Updated•11 years ago
|
Keywords: topcrash-b2g
Priority: -- → P4
Updated•11 years ago
|
Priority: P4 → P1
Comment 63•11 years ago
|
||
This is blocking partner certification, but we do need to find some reproducible testcase to have confidence we fixed it. Is there any information beyond that one stack trace?
Comment 64•11 years ago
|
||
I'm confident that we have not fixed this yet as we continue to see this crash regularly on v1.2 as a part of nightly monkey testing. No better STR from me other than let a monkey work on the device for a couple hours unfortunately. We can easily locally add WIP patches to improve logging and/or try out fixes though
Assignee | ||
Comment 65•11 years ago
|
||
The ongoing gfx work week in Paris is keeping busy the people who could work on this (me included) but the good news is that we have discussed this class of crashes (user-after-free of GrallocBufferActor, assuming that that is what the remaining crashes here are) and agreed on a short-term solution to this class of crashes.
Meanwhile I still can't reproduce but will submit asap a patch for you to try.
Assignee | ||
Comment 66•11 years ago
|
||
:m1, could you please try this patch?
It should fix this crash if my theory, that the problem is a dead GrallocBufferActor, is correct (and given the crash line and the fact that we already fixed the case of a dead SurfaceAllocator, I don't see any other possibility).
Unfortunately I still can't reproduce so I have to ask you to try this for me. Thanks!
Flags: needinfo?(mvines)
Comment 67•11 years ago
|
||
We applied the patch from attachment 821324 [details] [diff] [review]
But there is a new crash after applying it:
[@ mozilla::gl::SharedSurface_Gralloc::Create(mozilla::gl::GLContext*, mozilla::gl::GLFormats const&, nsIntSize const&, bool, mozilla::layers::ISurfaceAllocator*) | mozilla::gl::SurfaceFactory_Gralloc::CreateShared(nsIntSize const&) | mozilla::gfx::SurfaceFactory::NewSharedSurface(nsIntSize const&) | mozilla::gfx::SurfaceStream::New(mozilla::gfx::SurfaceFactory*, nsIntSize const&, mozilla::gfx::SharedSurface*&) ]
I'll attach the minidump, etc.
Comment 68•11 years ago
|
||
Comment 69•11 years ago
|
||
Comment 70•11 years ago
|
||
The app URL mentioned in attachment 821797 [details] could be considered suspicious (hi membuster!) so I'll add that we saw the same crash signature manifest in the following apps as well:
app://communications.gaiamobile.org/manifest.webapp
app://crystalskull.gaiamobile.org/manifest.webapp
app://ds-test.gaiamobile.org/manifest.webapp
app://geoloc.gaiamobile.org/manifest.webapp
app://wallpaper.gaiamobile.org/manifest.webapp
Flags: needinfo?(mvines)
Comment 71•11 years ago
|
||
hi guys I have just replicated this twice on today's nightly Hamachi master build.
https://crash-stats.mozilla.com/report/index/ede217a3-4900-487e-bb79-b15102131024
https://crash-stats.mozilla.com/report/index/becba26a-e0f8-42e4-91f6-665532131024
The replication is the same as comment #3 using the test automation but it is less frequent, maybe 1 in 5 rather than 5 in 5.
It is also crashing differently, a 'soft' crash with just a prompt in Gaia rather than a reboot crash, which means it's a bit tricky for our automation to detect.
It is not our automation submitting these reports (unless we observe, interrupt the test and submit the report like I did above) so it's occurring in the wild, too.
Build:
Gecko http://hg.mozilla.org/mozilla-central/rev/19fd3388c372
Gaia 62a84888f1c10556af8fbbd50e126bb1afe96276
BuildID 20131024040207
Version 27.0a1
Assignee | ||
Comment 72•11 years ago
|
||
Sorry that I didn't get back to you earlier. I still can't reproduce crashes locally, with or without the previous patch that I submitted. I don't know what that is! Note that I am only testing on b2g1.2, not on mozilla-central.
Anyway, rather than testing yet another quick and dirty fix, here is what should be the 'right' fix. It works for me locally, but then again, I couldn't reproduce the crash that it's supposed to fix. But at least, it makes sense.
I'm going to attach a combined patch to make it easier to apply to b2g1.2.
Assignee | ||
Comment 73•11 years ago
|
||
Michael, please test this on b2g 1.2 ! Only apply this patch, no need for anything else.
(Or would you like me to make a mozilla-central version now?)
Flags: needinfo?(mvines)
Assignee | ||
Comment 74•11 years ago
|
||
There were some issues in the previous combined patch; here is the latest and greatest. I'd be very interested in testing!
Attachment #824341 -
Attachment is obsolete: true
Comment 75•11 years ago
|
||
From the crash info in Comment 71, I feel that the problem might not related to SurfaceDescriptor nor GrallocBufferActor, but related to SharedSurface.
Assignee | ||
Comment 76•11 years ago
|
||
Thanks Sotaro! It seems that you are right about this being unrelated to GrallocBufferActor:
The crash address is 0x18 which is 24, which is the offset of DestroySharedSurface in the ISurfaceAllocator vtable (because it is the 7th virtual method in this class), which we already encountered in comment 28.
So it seems that mAllocator is still nullptr... as if the WeakPtr didn't work. But the other threads stacks in the crash reports do not seem to suggest a race condition, so at the moment I don't understand what is happening.
Assignee | ||
Comment 77•11 years ago
|
||
The code really is
if (mAllocator) {
mAllocator->DestroySharedSurface(&desc);
}
so we must be having a race condition here, for mAllocator to be null by the time we call DestroySharedSurface!
Comment 78•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #77)
Should I hold off on testing with attachment 824635 [details] [diff] [review] then?
Updated•11 years ago
|
Flags: needinfo?(bjacob)
Updated•11 years ago
|
Flags: needinfo?(mvines)
Assignee | ||
Comment 79•11 years ago
|
||
Yes, please hold off for now. Sorry about the noise.
Flags: needinfo?(bjacob)
Comment 80•11 years ago
|
||
Remove 932537 from dependent bug from Comment 75, Comment 76.
No longer depends on: 932537
Comment 81•11 years ago
|
||
Strange part of the crash is that stack shows the destructor of SurfaceStream_SingleBuffer. In normal use cases, SurfaceStream_SingleBuffer is destroyed soon after creation. It is replaced by new omtc supported one in GLScreenBuffer::Morph() during ClientCanvasLayer::Initialize().
Comment 82•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #81)
> Strange part of the crash is that stack shows the destructor of
> SurfaceStream_SingleBuffer. In normal use cases, SurfaceStream_SingleBuffer
> is destroyed soon after creation. It is replaced by new omtc supported one
> in GLScreenBuffer::Morph() during ClientCanvasLayer::Initialize().
Oh, I understand it. If canvas is used for thumbnail generation, SurfaceStream_SingleBuffer is used without change.
Assignee | ||
Updated•11 years ago
|
Depends on: RefcntAllocator
Comment 83•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #82)
> (In reply to Sotaro Ikeda [:sotaro] from comment #81)
> > Strange part of the crash is that stack shows the destructor of
> > SurfaceStream_SingleBuffer. In normal use cases, SurfaceStream_SingleBuffer
> > is destroyed soon after creation. It is replaced by new omtc supported one
> > in GLScreenBuffer::Morph() during ClientCanvasLayer::Initialize().
>
> Oh, I understand it. If canvas is used for thumbnail generation,
> SurfaceStream_SingleBuffer is used without change.
I'm just going to reaffirm that this is what should be happening. (from my knowledge of snapshotting)
Assignee | ||
Comment 84•11 years ago
|
||
This combined patch, which applies on B2G 1.2, implements the completely different approach taken in bug 933082 (I filed a separate bug to discuss these patches, because I didn't want to spam the long CC list here).
I really hope that this fixes the crash, and if somehow it doesn't, at least we'll learn something, because it changes completely the way that this mAllocator is managed.
Michael, Greg, please try it!
Comment 85•11 years ago
|
||
(In reply to Greg Grisco from comment #68)
> Created attachment 821796 [details]
> decoded minidump after applying 821324
From attachment 821796 [details], in SharedSurface_Gralloc::Create(), it become clear that argument allocator is used without nullptr check.
Comment 86•11 years ago
|
||
SharedSurface_Gralloc is still holding raw pointer of layers::ISurfaceAllocator.
http://mxr.mozilla.org/mozilla-central/source/gfx/gl/SharedSurfaceGralloc.h#41
WeakPtr<layers::ISurfaceAllocator> is used only in SurfaceFactory_Gralloc.
Comment 87•11 years ago
|
||
bjacob, before trying attachment 825046 [details] [diff] [review], it might be better to fix Comment 85 and Comment 86.
Flags: needinfo?(bjacob)
Assignee | ||
Updated•11 years ago
|
No longer depends on: RefcntAllocator
Assignee | ||
Comment 88•11 years ago
|
||
Sotaro: Thanks, you are right that we were only using WeakPtr in one of two places. Also, Matt's comments on bug 933082 showed me how my interpretation was wrong. The crash at address 0x18 means that mAllocator->vptr is null, which means that *mAllocator was destroyed already. It does not mean that mAllocator (as a pointer) is null. So with these two things together, the mystery is resolved.... sorry!
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(mvines)
Flags: needinfo?(ggrisco)
Flags: needinfo?(bjacob)
Assignee | ||
Updated•11 years ago
|
Attachment #825046 -
Attachment is obsolete: true
Assignee | ||
Comment 89•11 years ago
|
||
Sotaro, do you see more that could be done, than this? In my understanding, this should fix the crashes reported in comment 71.
The other crash that you mentioned in comment 85 is the one mentioned in comment 68; it is with another patch applied, that we are not considering anymore. So maybe we don't need to worry about it.
Attachment #825255 -
Flags: feedback?(sotaro.ikeda.g)
Comment 90•11 years ago
|
||
> The other crash that you mentioned in comment 85 is the one mentioned in
> comment 68; it is with another patch applied, that we are not considering
> anymore. So maybe we don't need to worry about it.
Yeah, I checked the code, we do not need to worry about it.
Updated•11 years ago
|
Attachment #825255 -
Flags: feedback?(sotaro.ikeda.g) → feedback+
Comment 92•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #89)
> Sotaro, do you see more that could be done, than this? In my understanding,
> this should fix the crashes reported in comment 71.
About this bug, I do not know another thing need to do. By Bug 933082, all holding reference to ISurfaceAllocator need to change to thread-safe strong reference.
Assignee | ||
Comment 93•11 years ago
|
||
Comment on attachment 825255 [details] [diff] [review]
Use WeakPtr also in SharedSurface_Gralloc
OK, on to review.
Attachment #825255 -
Flags: review?(sotaro.ikeda.g)
Updated•11 years ago
|
Attachment #825255 -
Flags: review?(sotaro.ikeda.g) → review+
Assignee | ||
Comment 94•11 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/e361a255253f
Credits really go to Sotaro and Matt for finding how my interpretation of this crash was wrong!
Comment 95•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #94)
Can you summarize what patches should be applied for re-test? Is it just attachment 825255 [details] [diff] [review]?
Thanks,
-Greg
Updated•11 years ago
|
Flags: needinfo?(bjacob)
Assignee | ||
Comment 96•11 years ago
|
||
(In reply to Greg Grisco from comment #95)
> Can you summarize what patches should be applied for re-test? Is it just
> attachment 825255 [details] [diff] [review]?
Yes, just this one tiny patch. Thanks!
Flags: needinfo?(bjacob)
Comment 97•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 98•11 years ago
|
||
Comment 99•11 years ago
|
||
After applying the patch we could not reproduce in last night's test run. Thanks!
Comment 100•11 years ago
|
||
v cool!
I'll try my steps from comment #71 again too.
Assignee | ||
Comment 101•11 years ago
|
||
\o/ Thanks really go to Sotaro!
Reporter | ||
Comment 102•11 years ago
|
||
Verified via socorro and not seeing crashes based on this crash signature.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•