FX OS crash in mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)

RESOLVED FIXED in Firefox 28, Firefox OS v1.3

Status

()

Core
DOM
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: marcia, Assigned: kats)

Tracking

(4 keywords)

unspecified
mozilla29
ARM
Gonk (Firefox OS)
crash, regression, reproducible, topcrash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:1.3+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.2 wontfix, b2g-v1.3 fixed, b2g-v1.4 fixed)

Details

(Whiteboard: [b2g-crash] [CR 593145], crash signature)

Attachments

(3 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-af8f8ccd-8e1c-4fc8-8322-64b1f2131219.
=============================================================

Seen on Buri while running the latest nightly.

Gaia   e6276c96fcf087a12800419be8a9e25fcbbc924d
SourceStamp 5c7fa2bfea8b
BuildID 20131219040202
Version 29.0a1

STR:
1. Last thing done on the phone was operating in gallery (editing a photo). Put phone in my pocket and when I retrieved it received this crash.

No logcat available since I was not tethered at the time.
Duplicate of this bug: 952195
Duplicate of this bug: 952193
This has to be a regression from something that landed yesterday.
blocking-b2g: --- → 1.4?
Keywords: regression, steps-wanted
Vivien - Did bug 950489 cause this crash? It appears to have touched the TabChild.cpp file within the regression range that matches the crash stack.
Flags: needinfo?(21)
I assume it should be related. Will try to dig into it. I can reproduce with Naoki's step from one of the dup.

STR: 
Occurred when trying to update a contact with a picture really quick after creating a new contact with an image from the gallery.
(ie. gallery opened / closed and tried to open it again )
Flags: needinfo?(21)

Updated

5 years ago
status-b2g-v1.3: --- → affected

Updated

5 years ago
Blocks: 950489
blocking-b2g: 1.4? → 1.3?

Updated

5 years ago
Keywords: steps-wanted → reproducible
Blocks: 942267
blocking-b2g: 1.3? → 1.3+
With bug 949347 now fixed, this crash is being hit instead during our smoketests, so this is now a smoketest blocker.
Keywords: smoketest
Created attachment 8356078 [details] [diff] [review]
Remove unnecessary cast

I don't think this will fix the problem but unnecessary casts like this bug me.
Attachment #8356078 - Flags: review?(21)
Created attachment 8356087 [details] [diff] [review]
Add explicit copy constructor

I suspect this might be the problem. WidgetTouchEvent contains an nsTArray but has no explicit copy constructor, so when we do things like WidgetTouchEvent localEvent(aEvent) I think the nsTArray doesn't get copied properly. Specifically I don't think the refcount of the touches in the list gets incremented, and so they could get deleted out from under us, resulting in a dangling pointer and this error. I haven't confirmed this theory but putting the patch up anyway - it should be safe to land even if this isn't the problem.

Try push for builds at https://tbpl.mozilla.org/?tree=Try&rev=e3245179b723
Attachment #8356087 - Flags: review?(21)
I feel a bit uncomfortable reviewing those. Is Chris Lord a good reviewer for those?
/me reviews

Updated

5 years ago
Attachment #8356078 - Flags: review?(21) → review+
Comment on attachment 8356087 [details] [diff] [review]
Add explicit copy constructor

Could you rather fix AssignTouchEventData and
disabled copy-ctor.
Or if that ends up being tricky for some reason, fix AssignTouchEventData
and use that in copy-ctor.
Attachment #8356087 - Flags: review?(21) → review-
Comment on attachment 8356087 [details] [diff] [review]
Add explicit copy constructor

Or actually, this setup is rather error prone already, and the patch
doesn't make it any worse.
I'll file a followup to clean up this.
Attachment #8356087 - Flags: review- → review+
Also I screwed up that try push. Better one here:

https://tbpl.mozilla.org/?tree=Try&rev=9679bb7f7957
Kats,

Is anything pending to land for the solution to be completed here?
Flags: needinfo?(bugmail.mozilla)
Not really. But I don't know if the patches I landed will actually fix the crash or not. If the STR are reliable can somebody from QA test a build with these patches to see if it fixes the problem?
Flags: needinfo?(bugmail.mozilla)

Comment 17

5 years ago
STR
Issue appears to occur reliably if user scrolls when loading any app that has ability to horizontally or vertically scroll (e.g. Calender, Gallery, Homescreen [from lockscreen])

Updated

5 years ago
Duplicate of this bug: 956864
https://hg.mozilla.org/mozilla-central/rev/343ac499c993
https://hg.mozilla.org/mozilla-central/rev/bf2f2356f3b6
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
https://hg.mozilla.org/releases/mozilla-aurora/rev/758629e75ddf
https://hg.mozilla.org/releases/mozilla-aurora/rev/d8852933a8b4
Assignee: nobody → bugmail.mozilla
status-b2g-v1.3: affected → fixed
status-firefox27: --- → fixed
status-firefox28: --- → fixed
status-firefox29: --- → fixed
Whoops, I probably should have marked this as no_uplift until we verified the fix. Ah well the uplift shouldn't hurt. But we'll still need to verify the fix.
status-firefox27: fixed → ---
We just confirmed that the patches here didn't fix the crash, so I'm reopening.
Status: RESOLVED → REOPENED
status-b2g-v1.3: fixed → ---
status-firefox28: fixed → ---
status-firefox29: fixed → ---
Resolution: FIXED → ---
Kats,

To make sure this blocker is on your radar I am ni you.

As you know this is a QC blocker and they are expecting a fix by 1/10/14.
Flags: needinfo?(bugmail.mozilla)
Jason, did you try backing out bug 950489 to see if it fixes the crash?
Flags: needinfo?(bugmail.mozilla) → needinfo?(jsmith)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #25)
> Jason, did you try backing out bug 950489 to see if it fixes the crash?

Not yet.
Flags: needinfo?(jsmith)

Updated

5 years ago
Duplicate of this bug: 957207

Updated

5 years ago
Duplicate of this bug: 956891
status-b2g-v1.3: --- → affected

Updated

5 years ago
Duplicate of this bug: 958052
Interesting to note that this crash has actually been around for a long time. See for instance https://crash-stats.mozilla.com/report/index/4535287d-a194-4524-a005-649e32131215 which occurred on B2G-18 build from 20130621133927. The frequency just spiked way up on Dec 19, probably from the landing of bug 950489. Backing that out will probably reduce the frequency back to old levels but won't fix it completely.
Crash Signature: [@ mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)] → [@ mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)] [@ mozilla::dom::TabChild::UpdateTapState ]
Also, this crash should only happen when APZC is disabled, and as of last night APZC is enabled by default so this crash shouldn't get very often. Nonetheless I will spend some time today to see if I can figure it out.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> Also, this crash should only happen when APZC is disabled, and as of last
> night APZC is enabled by default so this crash shouldn't get very often.
> Nonetheless I will spend some time today to see if I can figure it out.

Per talking w/Marcia - she mentioned on a build with APZC enabled, the crash rate is definitely lower. However, it's still possible to hit in Gaia. She can add more details on the reproduction steps.
Here are the steps from my crash today:

1. Incoming call log, select a new number and "Create a new contact"
2. Contacts app: Start typing a name in the first name field, and then backspace a few times using the delete key on the keyboard
No longer a smoketest blocker with turning APZC on.
Keywords: smoketest
Created attachment 8358119 [details] [diff] [review]
bug952170.patch

I put in a patch just to null check and see what was going on.  without APZ turned on, I got an error : 
01-09 17:09:32.759: E/GeckoConsole(1653): [JavaScript Error: "NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIContentFrameMessageManager.content]" {file: "chrome://global/content/BrowserElementChildPreload.js" line: 737}]

And I did not crash.  Having said that I don't think I fixed it, something else must have fixed it.  We should probably check in tomorrow's build if we crash or not without APZ turned on.  If we still do, we can try creating a new build and seeing if someone else crashes with this patch?

Gaia      a5288df8de82cd84be14391c96f99a140535a78e
Gecko     78ebe7f47e81d6ffbaff3ff12f32597aca457205
BuildID   20140109155817
Version   29.0a1
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013
Apparently, this still crashed even with APZ turned on according to the smoke testers.
Kats,

Please prioritize this blocker. it blocks QC
Flags: needinfo?(bugmail.mozilla)
I have not yet been able to reproduce this. I'll keep trying.
Flags: needinfo?(bugmail.mozilla)
Attachment #8356078 - Flags: checkin+
Attachment #8356087 - Flags: checkin+
Comment on attachment 8358119 [details] [diff] [review]
bug952170.patch

Review of attachment 8358119 [details] [diff] [review]:
-----------------------------------------------------------------

I assume that since you didn't mention it, you never saw the "touch is null" output get hit, which means that the null check here never triggered at all. So I don't think this patch will have much effect.

Also according to the crashreports the crashes were a SIGBUS rather than a SIGSEGV so I don't think it's a null pointer dereference that's causing the problem.
Attachment #8358119 - Flags: review?(bugmail.mozilla) → review-
(In reply to Marcia Knous [:marcia - use needinfo] from comment #33)
> Here are the steps from my crash today:
> 
> 1. Incoming call log, select a new number and "Create a new contact"
> 2. Contacts app: Start typing a name in the first name field, and then
> backspace a few times using the delete key on the keyboard

Another way I was able to reproduce it three times in a row using today's build involved installing the XE Currency app and then trying to add a currency by Country - typing a few characters in the text field caused the crash to happen three times: https://crash-stats.mozilla.com/report/index/58d37b4e-53c2-40a2-9b89-9e7d82140113

Comment 41

5 years ago
Caveat: Slightly older build and on Peak

https://crash-stats.mozilla.com/report/index/fd0896c7-ec57-4f67-b35e-7de842140113

STR:
1. Tap settings icon to load app.
2. Immediately tap the screen again before the app loads fully.
I think I can reproduce it now, using the STR in comment 41 along with a call to nsDOMWindowUtils::GarbageCollectNow() just before the UpdateTapState call at http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp?rev=49c6393d37f3#1875

Investigating...
Created attachment 8359361 [details] [diff] [review]
Patch

Well that was a bit anticlimactic. DispatchWidgetEvent was removing all the touch points from the event, so when we did event.touches[0] we got back a pointer that had a 0x1 value. Dereferencing that caused the SIGBUS.

The GarbageCollect() call I mentioned above seems unrelated except that maybe it slows down things a bit and widens the window in which I could trigger the bug.
Attachment #8359361 - Flags: review?(bugs)
I am hitting this crash with a 100% repro rate by replying to an email in the Email app and typing in the body field. The keyboard disappears after a few characters are typed. When this process of selecting the body and attempting to type is repeated 1 to 4 more times this crash is experienced. 

Environmental Variables
Device: Buri v 1.4 Mozilla RIL
Build ID: 20140113040202
Gecko: http://hg.mozilla.org/mozilla-central/rev/12d3ba62a599
Gaia: fd3b9a97cb3c41cfa56be387b46a51db136b4422
Platform Version: 29.0a1
Firmware Version: V1.2-device.cfg
Comment on attachment 8359361 [details] [diff] [review]
Patch

I think we should just null check, since if we remove stuff from the touch list,
we didn't dispatch events for the touch.
Attachment #8359361 - Flags: review?(bugs) → review-
Created attachment 8359400 [details] [diff] [review]
Patch v2

Ok, that's even easier then.
Attachment #8359361 - Attachment is obsolete: true
Attachment #8359400 - Flags: review?(bugs)

Updated

5 years ago
Attachment #8359400 - Flags: review?(bugs) → review+
(In reply to Martijn Wargers [:mwargers] (QA) from comment #47)
> I get this crash after a while, while running:
> http://people.mozilla.org/~mwargers/tests/contenteditable/
> contenteditable_focus_blur_parentframe.htm

Presumably this is without the patch?
Status: REOPENED → ASSIGNED
status-b2g-v1.2: --- → wontfix
status-b2g-v1.4: --- → affected
status-firefox27: --- → wontfix
status-firefox28: --- → affected
status-firefox29: --- → affected
https://hg.mozilla.org/mozilla-central/rev/a490b036e643
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Attachment #8358119 - Attachment is obsolete: true
Attachment #8359400 - Flags: checkin+
https://hg.mozilla.org/releases/mozilla-aurora/rev/f1e418585203
status-b2g-v1.3: affected → fixed
status-b2g-v1.4: affected → fixed
status-firefox28: affected → fixed
status-firefox29: affected → fixed
Whiteboard: [b2g-crash] → [b2g-crash] [CR 593145]
You need to log in before you can comment on or make changes to this bug.