Closed
Bug 831313
Opened 10 years ago
Closed 10 years ago
Scroll performance testcase with moz-transform causes crash and reboot of Otoro device
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:tef+, firefox20 wontfix, firefox21 wontfix, firefox22 fixed, b2g18+ fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)
People
(Reporter: martijn.martijn, Assigned: mattwoodrow)
References
()
Details
(Keywords: crash, testcase, Whiteboard: [b2g-crash] QARegressExclude)
Crash Data
Attachments
(3 files)
Perhaps the same as/related to bug 820175. Steps to reproduce: - Go to testcase url - Tap on the "Scroll with moztransform test" link - Make some pinch zooming movements in and out Result: crash on my Otoro device
Updated•10 years ago
|
blocking-b2g: --- → tef?
Updated•10 years ago
|
Whiteboard: [b2g-crash]
Comment 1•10 years ago
|
||
I've CC'd a few people who could perhaps help us figure out what is causing this. We discussed this during triage today and while we really don't want to have this type of crasher, the test case is enough of a stress test that we don't think it's something we should block on.
blocking-b2g: tef? → -
tracking-b2g18:
--- → +
Comment 2•10 years ago
|
||
Can you please provide logcat of this crash?
Comment 3•10 years ago
|
||
We had another bug with this same testcase, right?
Comment 4•10 years ago
|
||
backtrace from gdb in the b2g process: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 824.856] jemalloc_crash () at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:1582 1582 MOZ_CRASH(); (gdb) bt #0 jemalloc_crash () at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:1582 #1 0x4002b954 in arena_run_reg_dalloc (ptr=<value optimized out>, offset=<value optimized out>) at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:3329 #2 arena_dalloc_small (ptr=<value optimized out>, offset=<value optimized out>) at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:4540 #3 arena_dalloc (ptr=<value optimized out>, offset=<value optimized out>) at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:4668 #4 0x4002cd9a in free (ptr=0x48cf4340) at /home/cervantes/hg/mozilla-central/memory/mozjemalloc/jemalloc.c:6589 #5 0x40026c98 in _ZdlPv (ptr=0x48cf4340) at /home/cervantes/hg/mozilla-central/memory/build/mozmemory_wrap.c:62 #6 0x42b4f540 in gralloc::gpu_context_t::free_impl (this=<value optimized out>, hnd=0x48cf4340) at hardware/qcom/display/libgralloc/gpu.cpp:285 #7 0x42b4f8c6 in gralloc::gpu_context_t::alloc_impl (this=0x4a9fdb70, w=64, h=6, format=1, usage=307, pHandle=0x48cf4340, pStride=0x48cf432c, bufferSize=0) at hardware/qcom/display/libgralloc/gpu.cpp:256 #8 0x42b4f926 in gralloc::gpu_context_t::gralloc_alloc (dev=0x48cf4340, w=<value optimized out>, h=<value optimized out>, format=<value optimized out>, usage=307, pHandle=0x48cf4340, pStride=0x48cf432c) at hardware/qcom/display/libgralloc/gpu.cpp:296 #9 0x402ba2e6 in android::GraphicBufferAllocator::alloc (this=<value optimized out>, w=64, h=6, format=1, usage=307, handle=0x48cf4340, stride=0x48cf432c) at frameworks/base/libs/ui/GraphicBufferAllocator.c pp:102 #10 0x402b9c62 in android::GraphicBuffer::initSize (this=0x48cf4300, w=64, h=6, format=1, reqUsage=307) at frameworks/base/libs/ui/GraphicBuffer.cpp:149 #11 0x402b9fd6 in GraphicBuffer (this=0x48cf4300, w=64, h=6, reqFormat=1, reqUsage=307) at frameworks/base/libs/ui/GraphicBuffer.cpp:62 #12 0x418ed824 in mozilla::layers::GrallocBufferActor::Create (aSize=..., aContent=@0x46cff7b0, aOutHandle=0x46cff794) at /home/cervantes/hg/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:208 #13 0x418eb4c6 in mozilla::layers::ShadowLayersParent::AllocPGrallocBuffer (this=<value optimized out>, aSize=<value optimized out>, aContent=<value optimized out>, aOutHandle=0x1) at /home/cervantes/hg/mozilla-central/gfx/layers/ipc/ShadowLayersParent.cpp:500 #14 0x41640a4e in mozilla::layers::PLayersParent::OnMessageReceived (this=0x477b7f00, __msg=<value optimized out>, __reply=@0x46cffc0c) at /home/cervantes/git/b2g-device2/B2G/objdir-gecko-dbg/ipc/ipdl/PLayersParent.cpp:452 #15 0x416350a2 in mozilla::layers::PCompositorParent::OnMessageReceived (this=0x437bc6f0, __msg=..., __reply=@0x46cffc0c) at /home/cervantes/git/b2g-device2/B2G/objdir-gecko-dbg/ipc/ipdl/PCompositorParent.cpp:411 #16 0x415e06a4 in mozilla::ipc::SyncChannel::OnDispatchMessage (this=0x437bc6f8, msg=...) at /home/cervantes/hg/mozilla-central/ipc/glue/SyncChannel.cpp:145 #17 0x415de10a in mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x437bc6f8) at /home/cervantes/hg/mozilla-central/ipc/glue/RPCChannel.cpp:400 #18 0x415ad8cc in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/tuple.h:383 #19 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/task.h:307 #20 0x415dc566 in mozilla::ipc::RPCChannel::RefCountedTask::Run (this=0x4779daa0) at ../../dist/include/mozilla/ipc/RPCChannel.h:425 #21 mozilla::ipc::RPCChannel::DequeueTask::Run (this=0x4779daa0) at ../../dist/include/mozilla/ipc/RPCChannel.h:448 #22 0x4185da62 in MessageLoop::RunTask (this=0x46cffdd0, task=0x4779daa0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:333 #23 0x4185e28c in MessageLoop::DeferOrRunPendingTask (this=0x133, pending_task=<value optimized out>) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:341 #24 0x4185efde in MessageLoop::DoWork (this=0x46cffdd0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:441 #25 0x4185f35a in base::MessagePumpDefault::Run (this=0x4603d880, delegate=0x46cffdd0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_pump_default.cc:23 #26 0x4185e016 in MessageLoop::RunInternal (this=0x46cffdd0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:215 #27 0x4185e076 in MessageLoop::RunHandler (this=0x46cffdd0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:208 #28 MessageLoop::Run (this=0x46cffdd0) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:182 #29 0x41867fdc in base::Thread::ThreadMain (this=0x46049f40) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/thread.cc:156 #30 0x41875ba2 in ThreadFunc (closure=0x133) at /home/cervantes/hg/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:39 #31 0x4005be18 in __thread_entry (func=0x41875b99 <ThreadFunc>, arg=0x46049f40, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 #32 0x4005b96c in pthread_create (thread_out=<value optimized out>, attr=0xbee45238, start_routine=0x41875b99 <ThreadFunc>, arg=0x46049f40) at bionic/libc/bionic/pthread.c:357 #33 0x4a30fd20 in ?? () Cannot access memory at address 0x0 #34 0x4a30fd20 in ?? () Cannot access memory at address 0x0
Updated•10 years ago
|
Crash Signature: [@ jemalloc_crash | arena_run_reg_dalloc | arena_dalloc_small | arena_dalloc | free | _ZdlPv]
Comment 5•10 years ago
|
||
That's a very interesting crash stack, but I would also very much appreciate logcat here. (In fact, I would appreciate if /all/ crashes reported by QA came with logcat and gdb stacks, where feasible.)
Comment 6•10 years ago
|
||
The jemalloc assertion indicates that we're freeing an interior pointer or doing some other badness. This looks to me like it may a bug in the qcom driver. To wit, hardware/qcom/display/libgralloc/gpu.cpp's gpu_context_t::alloc_impl does: err = genlock_create_lock((native_handle_t*)(*pHandle)); if (err) { LOGE("%s: genlock_create_lock failed", __FUNCTION__); free_impl(reinterpret_cast<private_handle_t*>(pHandle)); return err; } free_impl then eventually does |delete hnd|. The delete is what's causing us to crash. This is sketchy to me because it seems that frameworks/base/libs/ui/GrahpicBuffer.cpp owns |handle|, not gpu_context_t. Also gpu_context_t's free_impl doesn't look anything like GraphicsBuffer's free_handle. I don't see where it's allocated exactly, but my guess would be that handle isn't malloc()'ed, or is an interior pointer into some larger malloc()'ed block.
Reporter | ||
Comment 7•10 years ago
|
||
This is a catlog of logcat, but I have doubts it is useful. I didn't get crash stack ids when b2g rebooted. I also occasionally get that the content process in the browser itself crashes, this is a crash stack id of that: https://crash-stats.mozilla.com/report/index/d0124680-a215-408c-9922-3ea8b2130117
Comment 8•10 years ago
|
||
Renoming: This looks quite bad, and could very well be exploitable.
blocking-b2g: - → tef?
Comment 9•10 years ago
|
||
Let's block on this to at least investigate. Milan, can you help find an assignee?
Assignee: nobody → milan
blocking-b2g: tef? → tef+
Comment 11•10 years ago
|
||
Actually, Benoit is chasing a big regression in 18 - Jeff, can you take a quick look?
Assignee: bjacob → jmuizelaar
Comment 12•10 years ago
|
||
Michael, can you get someone from qualcomm to comment on what's going on here?
Flags: needinfo?(mvines)
Comment 13•10 years ago
|
||
This is a bug of misusing pointer and pointer to pointer in gpu.cpp. gralloc_alloc_framebuffer_locked() and gralloc_alloc_buffer() both do the following (take gralloc_alloc_framebuffer_locked() as example): line 116: private_handle_t* hnd = new private_handle_t(...); and then by the end: *pHandle = hnd; where pHandle is of type buffer_handle_t* Then back to the caller gpu_context_t::alloc_impl(): line 242: err = gralloc_alloc_framebuffer(size, usage, pHandle); ... err = genlock_create_lock((native_handle_t*)(*pHandle)); if (err) { LOGE("%s: genlock_create_lock failed", __FUNCTION__); free_impl(reinterpret_cast<private_handle_t*>(pHandle)); return err; } It looks like gralloc_alloc_framebuffer_locked() wants to change the pointer in gpu_context_t::alloc_impl() so if genlock_create_lock() fails (like too many FD open), then it frees the private_handle_t instance allocated in gralloc_alloc_framebuffer_locked(). Actually it doesn't because pHandle still points to the value passed in. We have a 100% crash as long as genlock_create_lock() returns non-zero value. We need Qualcomm to fix this bug.
Comment 14•10 years ago
|
||
(In reply to Cervantes Yu from comment #13) > err = genlock_create_lock((native_handle_t*)(*pHandle)); > if (err) { > LOGE("%s: genlock_create_lock failed", __FUNCTION__); > free_impl(reinterpret_cast<private_handle_t*>(pHandle)); I was wrong in comment #13. buffer_handle_t is actually a pointer type. The problem is simpler here: pHandle is pointer to pointer. It should pass *pHandle instead of pHandle to free_impl(). > return err; > } >
Updated•10 years ago
|
Flags: needinfo?(mvines) → needinfo?(dwilson)
Comment 15•10 years ago
|
||
Any news here?
Updated•10 years ago
|
status-b2g18:
--- → affected
status-b2g18-v1.0.0:
--- → affected
Comment 16•10 years ago
|
||
I can reproduce a crash on my otoro with the shared URL. Next up I'll try the suggested patch
Flags: needinfo?(dwilson)
Comment 17•10 years ago
|
||
Let us know how it goes with the suggested patch.
Comment 19•10 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #8) > Renoming: This looks quite bad, and could very well be exploitable. The only reason we're tef+ blocking at this point is because this is believed to be exploitable. Is that still the case?
Comment 20•10 years ago
|
||
Misuse of free() is among the most dangerous things one can do.
Updated•10 years ago
|
status-b2g18-v1.0.1:
--- → affected
Comment 22•10 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #20) > Misuse of free() is among the most dangerous things one can do. Given that, this should be a blocker for both QC and Mozilla.
Comment 23•10 years ago
|
||
NPOTB for now, since we don't think this is a problem in our source tree.
Whiteboard: [b2g-crash] → [b2g-crash][NPOTB]
Updated•10 years ago
|
Whiteboard: [b2g-crash][NPOTB] → [b2g-crash][NPOTB][target 28/2]
Comment 24•10 years ago
|
||
Removed [target 28/2] because [NPOTB]
Whiteboard: [b2g-crash][NPOTB][target 28/2] → [b2g-crash][NPOTB]
Comment 25•10 years ago
|
||
I applied the suggested patch in Comment 14 but there's still a crash in the test url. Next I'll check if it happens in the same place or further along
Flags: needinfo?(dwilson)
Comment 26•10 years ago
|
||
(In reply to Diego Wilson [:diego] from comment #25) > I applied the suggested patch in Comment 14 but there's still a crash in > the test url. > It's expected because the crash results from resource outage in the graphics driver. Even we don't crash here the we are not likely to proceed much further. The point is we don't crash because of misuse of free(), which is dangerous.
Comment 27•10 years ago
|
||
Looks like the libgralloc patch does solve the "free()" crash. Now I mostly get the "well this is embarassing :(" browser page which I think is what we always want in out-of-mem conditions. I'll send the patch on its way to CAF.
Comment 28•10 years ago
|
||
Comment 29•10 years ago
|
||
That being said, I do see a gecko crash sometimes when the ThebesLayer is trying to paint (crash stack attached). Is there a more graceful way of handling this?
That's the content process crashing right? Don't you get the "well this is embarrassing" browser page for that crash?
Comment 31•10 years ago
|
||
Gecko restarts. It's a ShadowThebesLayer crash so I'm guessing it's on the main process.
It's mozilla::layers::BasicShadowableThebesLayer::CreateBuffer, so I'm guessing it's the content process :-)
Comment 33•10 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32) > It's mozilla::layers::BasicShadowableThebesLayer::CreateBuffer, so I'm > guessing it's the content process :-) Is bug 834372 a dupe of this bug then?
Comment 34•10 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #33) > Is bug 834372 a dupe of this bug then? Seems related but not quite the same. The dimensions in bug 834372 look invalid. The dimensions in this bug are valid: #2 NS_DebugBreak_P (aSeverity=<value optimized out>, aStr=0xbe872d58 "creating ThebesLayer 'back buffer' failed! width=320, height=465, type=1000", aExpr=<value optimized out>, aFile=<value optimized out>, aLine=460) at /local/mnt/workspace/dwilson/ztecdr/gecko/xpcom/base/nsDebugImpl.cpp:380
Comment 35•10 years ago
|
||
The libgralloc gpu.cpp patch has been released here: https://www.codeaurora.org/gitweb/quic/lf/?p=b2g/build.git;a=commit;h=cee98d7cbfd59c1c4b4379c4b094aebc0d601c82 And should be found in releases AU_LINUX_GECKO_ICS_STRAWBERRY_V1.01.00.01.19.030 or later
Comment 36•10 years ago
|
||
Unless you guys want to track the ThebesLayer issue here we can close this bug now
Comment 37•10 years ago
|
||
(clearing NPOTB and Diego as assignee, for the ThebesLayer issue that remains in this bug)
Assignee: dwilson → nobody
Whiteboard: [b2g-crash][NPOTB] → [b2g-crash]
Comment 38•10 years ago
|
||
Since this is tef+ we'll need an assignee - starting with Roc for delegation.
Assignee: nobody → roc
I think we're not handling OOM well, or something like that.
Assignee: roc → matt.woodrow
Assignee | ||
Comment 40•10 years ago
|
||
It could be the main process (UI) that is creating Shadowable layers to send to the compositor. This doesn't look like a particularly big allocation, so if it is OOM, then we're likely to have issues in other places too. We can avoid this particular crash fairly easily, but it might result in some fairly broken rendering. Not sure if that's a big improvement.
Comment 41•10 years ago
|
||
How are things going here?
Assignee | ||
Comment 42•10 years ago
|
||
As I said before, the best this will do is replace a crash with broken rendering.
Attachment #727440 -
Flags: review?(roc)
Attachment #727440 -
Flags: review?(roc) → review+
Assignee | ||
Comment 43•10 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/11d3fabf5b4a
Comment 44•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/11d3fabf5b4a
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 45•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/3ccc30d682af https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/82ca4974c4e2
status-firefox20:
--- → wontfix
status-firefox21:
--- → wontfix
status-firefox22:
--- → fixed
Target Milestone: --- → B2G C4 (2jan on)
Updated•10 years ago
|
Whiteboard: [b2g-crash] → [b2g-crash] QARegressExclude
Comment 46•10 years ago
|
||
No Test case creation is needed in moztrap for this issue.
Flags: in-moztrap-
Comment 47•10 years ago
|
||
Cannot verify, need steps to blackbox test this issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•