Last Comment Bug 838215 - crash in mozilla::layout::RenderFrameParent::ContentReceivedTouch
: crash in mozilla::layout::RenderFrameParent::ContentReceivedTouch
Status: RESOLVED WORKSFORME
[b2g-crash]
: b2g-testdriver, crash, sec-moderate, unagi
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: ARM Gonk (Firefox OS)
: -- critical (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 833964
Blocks: orangfuzz b2gSystemSecurity
  Show dependency treegraph
 
Reported: 2013-02-05 09:17 PST by Naoki Hirata :nhirata (please use needinfo instead of cc)
Modified: 2014-07-14 22:07 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
affected
wontfix
unaffected
unaffected
unaffected
unaffected
unaffected


Attachments
unreduced orangutan script that crashes at this location after ~20+ minutes (242.40 KB, text/plain)
2013-03-13 19:41 PDT, Gary Kwong [:gkw] [:nth10sd]
no flags Details
Backtrace for the segfault seen in comment #11 (6.31 KB, text/plain)
2013-04-04 03:21 PDT, Gabriele Svelto [:gsvelto]
no flags Details

Description Naoki Hirata :nhirata (please use needinfo instead of cc) 2013-02-05 09:17:55 PST
This bug was filed from the Socorro interface and is 
report bp-5b374516-277f-4649-a459-b5eb92130204 .
============================================================= 

Crashing Thread
Frame 	Module 	Signature 	Source
0 		@0xffdfffde 	
1 	libxul.so 	mozilla::layout::RenderFrameParent::ContentReceivedTouch 	RenderFrameParent.cpp:979
2 	libxul.so 	mozilla::dom::TabParent::RecvContentReceivedTouch 	TabParent.cpp:1304
3 	libxul.so 	mozilla::dom::PBrowserParent::OnMessageReceived 	PBrowserParent.cpp:1477
4 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	PContentParent.cpp:1394
5 	libxul.so 	mozilla::ipc::AsyncChannel::OnDispatchMessage 	AsyncChannel.cpp:473
6 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	RPCChannel.cpp:402
7 	libxul.so 	RunnableMethod<IPC::ChannelProxy::Context, void , Tuple0>::Run 	tuple.h:383
8 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:425
9 	libxul.so 	MessageLoop::RunTask 	message_loop.cc:333
10 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	message_loop.cc:341
11 	libxul.so 	MessageLoop::DoWork 	message_loop.cc:441
12 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	MessagePump.cpp:42
13 	libxul.so 	nsThread::ProcessNextEvent 	nsThread.cpp:620
14 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:237
15 	libxul.so 	mozilla::ipc::MessagePump::Run 	MessagePump.cpp:117
16 	libxul.so 	MessageLoop::RunInternal 	message_loop.cc:215
17 	libxul.so 	MessageLoop::Run 	message_loop.cc:208
18 	libxul.so 	nsBaseAppShell::Run 	nsBaseAppShell.cpp:163
19 	libxul.so 	nsAppStartup::Run 	nsAppStartup.cpp:290
20 	libxul.so 	XREMain::XRE_mainRun 	nsAppRunner.cpp:3794
21 	libxul.so 	XREMain::XRE_main 	nsAppRunner.cpp:3860
22 	libxul.so 	XRE_main 	nsAppRunner.cpp:3935
23 	b2g 	main 	nsBrowserApp.cpp:164
24 	libc.so 	__libc_init 	libc_init_dynamic.c:114
25 	libc.so 	__cxa_atexit 	atexit.c:99
26 		@0xbefeed24 	

Crash on Otoro for touch events... Not sure how to repro ATM.
Build ID	20130203070202
Comment 1 Gary Kwong [:gkw] [:nth10sd] 2013-03-13 19:41:05 PDT
Created attachment 724739 [details]
unreduced orangutan script that crashes at this location after ~20+ minutes

Here's an unreduced orangutan script testcase that results in this crash.
Comment 2 Gary Kwong [:gkw] [:nth10sd] 2013-03-13 19:43:02 PDT
I haven't tried to reproduce this crash with the testcase yet.
Comment 3 Gary Kwong [:gkw] [:nth10sd] 2013-03-13 19:45:33 PDT
I crashed on the following build: 20130311095652:

https://crash-stats.mozilla.com/report/index/1b26b023-bbc0-4976-b6d8-eab2b2130314
Comment 4 Lukas Blakk [:lsblakk] use ?needinfo 2013-03-18 08:12:09 PDT
We'll track since this is reproducible and we should get an engineer to investigate further so we can get a sense of how likely users are to hit it. Assigning to David for delegation.
Comment 5 Anthony Ricaud (:rik) 2013-03-19 06:27:46 PDT
I let the script run over lunch (~1h30) and it didn't crash for me. I'm using 20130310230202.

Also, this is a Gecko crash so not my area of expertise.
Comment 6 Anthony Ricaud (:rik) 2013-03-19 07:46:17 PDT
Gabriele: Could you take a look?
Comment 7 Gabriele Svelto [:gsvelto] 2013-03-19 08:53:27 PDT
(In reply to Anthony Ricaud (:rik) from comment #6)
> Gabriele: Could you take a look?

Sure, will have a look at it ASAP.
Comment 8 Gary Kwong [:gkw] [:nth10sd] 2013-03-19 10:56:25 PDT
Some tips to help reproduce (as I recall from its initial run):

1. First reset your phone. Make sure one is on the beta channel without any test apps (e.g. Test Receiver etc.)
2. Boot up the phone without a SIM card, make sure it has no contacts, no extra apps on screen.
3. Only have wifi connected.
Comment 9 Gabriele Svelto [:gsvelto] 2013-04-03 09:23:30 PDT
I tried running the script provided in attachment 724739 [details], following the instructions from comment 8 using a v1-train userdebug build on my Unagi. The script went past the ~30 minutes mark without crashing. I'll try running the whole procedure again a couple more times to see if I can reproduce the issue; I'd be grateful if someone could add even more information on how to reproduce it reliably (and possibly in a shorter timespan).
Comment 10 Gabriele Svelto [:gsvelto] 2013-04-03 15:26:29 PDT
I've
Comment 11 Gabriele Svelto [:gsvelto] 2013-04-03 15:28:03 PDT
I've managed to reproduce a crash with the script and capture the event in GDB but the top stack frames look different than what we have here:

#0  0x4141a0e0 in mozilla::layers::GestureEventListener::HandleInputEvent (
    this=0x4863ba60, aEvent=...)
    at /home/gsvelto/projects/mozilla-b2g18/gfx/layers/ipc/GestureEventListener.cpp:159
#1  0x414151f6 in mozilla::layers::AsyncPanZoomController::HandleInputEvent (
    this=0x48f6d400, aEvent=...)
    at /home/gsvelto/projects/mozilla-b2g18/gfx/layers/ipc/AsyncPanZoomController.cpp:251
#2  0x414153bc in mozilla::layers::AsyncPanZoomController::ReceiveInputEvent (
    this=0x48f6d400, aEvent=...)
    at /home/gsvelto/projects/mozilla-b2g18/gfx/layers/ipc/AsyncPanZoomController.cpp:244
#3  0x414154c4 in mozilla::layers::AsyncPanZoomController::ReceiveMainThreadInputEvent (this=0x48f6d400, aEvent=...)
    at /home/gsvelto/projects/mozilla-b2g18/gfx/layers/ipc/AsyncPanZoomController.cpp:161
#4  0x40d2b5de in mozilla::layout::RenderFrameParent::NotifyInputEvent (
    this=<value optimized out>, aEvent=...)
    at /home/gsvelto/projects/mozilla-b2g18/layout/ipc/RenderFrameParent.cpp:782

I'll do some further investigation on this tomorrow.
Comment 12 Gary Kwong [:gkw] [:nth10sd] 2013-04-03 15:32:59 PDT
(In reply to Gabriele Svelto [:gsvelto] from comment #9)
> I tried running the script provided in attachment 724739 [details],
> following the instructions from comment 8 using a v1-train userdebug build
> on my Unagi. The script went past the ~30 minutes mark without crashing.
> I'll try running the whole procedure again a couple more times to see if I
> can reproduce the issue; I'd be grateful if someone could add even more
> information on how to reproduce it reliably (and possibly in a shorter
> timespan).

Gabriele, thanks for being able to reproduce it! orangfuzz (the fuzzer that produced this test output, see bug 857276) and the running harness is still in its infant stages - we're definitely thinking of making testcases more reliable and reducing to smaller steps.

I'm glad this is now reproducible!
Comment 13 Gabriele Svelto [:gsvelto] 2013-04-04 03:21:19 PDT
Created attachment 733239 [details]
Backtrace for the segfault seen in comment #11

I've attached the backtrace I got with gdb of the segmentation fault I've got while running the script. I have not yet fully understood what's going on; the crash is caused by the |mLongTapTimeoutTask| field pointing to what looks like a corrupt (or already freed?) object. This makes the |mLongTapTimeoutTask->Cancel()| call to cause the segfault.

More importantly I'm not sure if this issue is the same as the one seen in this bug though they both seem related to handling touch events and one might be causing the other. I'll try to reproduce yet again today and see what I get.
Comment 14 Gabriele Svelto [:gsvelto] 2013-04-04 10:46:11 PDT
I did another run today with an updated build and the script run without problems for over two hours. Unfortunately this seems really hard to reproduce :-(
Comment 15 Jason Smith [:jsmith] 2013-04-08 22:07:53 PDT
Sounds like what we are really looking for here is a more consistent STR. Flipping keywords as such to reflect this.
Comment 16 Daniel Veditz [:dveditz] 2013-04-09 10:20:55 PDT
The initial stack and comment 13 are exploitable conditions. Giving a lower "moderate" rating on the belief (hope) attackers can't create the sequence of user touches required to get into this state.
Comment 17 Gabriele Svelto [:gsvelto] 2013-05-10 08:14:17 PDT
I was unable to make further progress on this so I'm unassigning the bug.
Comment 18 Jason Smith [:jsmith] 2013-05-10 17:04:55 PDT
Gary - Can you take another shot at reproducing this? We might have to close if we don't have a way to make this actionable.
Comment 19 Gary Kwong [:gkw] [:nth10sd] 2013-05-11 18:06:42 PDT
I don't have any progress on this yet. Please feel free to close - I'll be sure to open a new bug when I have something more concrete.
Comment 20 Jason Smith [:jsmith] 2013-05-11 18:29:17 PDT
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #19)
> I don't have any progress on this yet. Please feel free to close - I'll be
> sure to open a new bug when I have something more concrete.

Sounds good.
Comment 21 Gary Kwong [:gkw] [:nth10sd] 2013-05-11 18:31:04 PDT
I'd say WFM - running the testcase no longer seems to throw that crash. At least not till a better testcase comes around.
Comment 23 Naoki Hirata :nhirata (please use needinfo instead of cc) 2013-07-23 17:01:04 PDT
It looks like it's happening on 1.0.1 and not 1.1?  We need to be able to query this better.
Comment 24 Scoobidiver (away) 2013-07-24 00:26:24 PDT
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #23)
> It looks like it's happening on 1.0.1 and not 1.1?  We need to be able to
> query this better.
We don't know whether it still happens on 1.1 because there are no enough users of that version on devices. Upgrading Geekphone users (considered as Beta testers) to that version would help.
Comment 25 Robert Kaiser (not working on stability any more) 2013-07-26 06:28:21 PDT
This seems to appear on 1.0.1 only for now, from what I see (but we have very little data on 1.1) - has bug 833964 fixed this one as well?
Comment 26 Robert Kaiser (not working on stability any more) 2013-08-04 05:22:48 PDT
Kats, you fixed bug 833964, does this look like it's just a dupe or a different issue?
Comment 27 Kartikaya Gupta (email:kats@mozilla.com) 2013-08-04 06:31:22 PDT
From the stack in comment 0 it does look like it could be a dupe. I'm not sure about the crash that :gsvelto reproduced in comment 13, that looks different.
Comment 28 Scoobidiver (away) 2013-08-07 07:08:18 PDT
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> I'm not sure about the crash that :gsvelto reproduced in comment 13, that looks
> different.
There have been no crashes for the last four weeks with the mozilla::layers::GestureEventListener::HandleInputEvent signature.
Comment 29 Mats Palmgren (:mats) 2014-07-13 12:25:54 PDT
Is this bug still relevant?  There are a few recently reported crashes but they are
all on 1.0.1.0-prerelease afaict.  Wontfix/worksforme?
Comment 30 Naoki Hirata :nhirata (please use needinfo instead of cc) 2014-07-14 22:07:40 PDT
Crashes look like they are all on 18 only.  Closing as WFM.

Note You need to log in before you can comment on or make changes to this bug.