Closed Bug 952170 Opened 11 years ago Closed 11 years ago

FX OS crash in mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)

Categories

(Core :: DOM: Core & HTML, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla29
blocking-b2g 1.3+
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.2 --- wontfix
b2g-v1.3 --- fixed
b2g-v1.4 --- fixed

People

(Reporter: marcia, Assigned: kats)

References

Details

(4 keywords, Whiteboard: [b2g-crash] [CR 593145])

Crash Data

Attachments

(3 files, 2 obsolete files)

This bug was filed from the Socorro interface and is report bp-af8f8ccd-8e1c-4fc8-8322-64b1f2131219. ============================================================= Seen on Buri while running the latest nightly. Gaia e6276c96fcf087a12800419be8a9e25fcbbc924d SourceStamp 5c7fa2bfea8b BuildID 20131219040202 Version 29.0a1 STR: 1. Last thing done on the phone was operating in gallery (editing a photo). Put phone in my pocket and when I retrieved it received this crash. No logcat available since I was not tethered at the time.
This has to be a regression from something that landed yesterday.
blocking-b2g: --- → 1.4?
Vivien - Did bug 950489 cause this crash? It appears to have touched the TabChild.cpp file within the regression range that matches the crash stack.
Flags: needinfo?(21)
I assume it should be related. Will try to dig into it. I can reproduce with Naoki's step from one of the dup. STR: Occurred when trying to update a contact with a picture really quick after creating a new contact with an image from the gallery. (ie. gallery opened / closed and tried to open it again )
Flags: needinfo?(21)
Blocks: 950489
blocking-b2g: 1.4? → 1.3?
blocking-b2g: 1.3? → 1.3+
With bug 949347 now fixed, this crash is being hit instead during our smoketests, so this is now a smoketest blocker.
Keywords: smoketest
I don't think this will fix the problem but unnecessary casts like this bug me.
Attachment #8356078 - Flags: review?(21)
I suspect this might be the problem. WidgetTouchEvent contains an nsTArray but has no explicit copy constructor, so when we do things like WidgetTouchEvent localEvent(aEvent) I think the nsTArray doesn't get copied properly. Specifically I don't think the refcount of the touches in the list gets incremented, and so they could get deleted out from under us, resulting in a dangling pointer and this error. I haven't confirmed this theory but putting the patch up anyway - it should be safe to land even if this isn't the problem. Try push for builds at https://tbpl.mozilla.org/?tree=Try&rev=e3245179b723
Attachment #8356087 - Flags: review?(21)
I feel a bit uncomfortable reviewing those. Is Chris Lord a good reviewer for those?
Attachment #8356078 - Flags: review?(21) → review+
Comment on attachment 8356087 [details] [diff] [review] Add explicit copy constructor Could you rather fix AssignTouchEventData and disabled copy-ctor. Or if that ends up being tricky for some reason, fix AssignTouchEventData and use that in copy-ctor.
Attachment #8356087 - Flags: review?(21) → review-
Comment on attachment 8356087 [details] [diff] [review] Add explicit copy constructor Or actually, this setup is rather error prone already, and the patch doesn't make it any worse. I'll file a followup to clean up this.
Attachment #8356087 - Flags: review- → review+
Also I screwed up that try push. Better one here: https://tbpl.mozilla.org/?tree=Try&rev=9679bb7f7957
Kats, Is anything pending to land for the solution to be completed here?
Flags: needinfo?(bugmail.mozilla)
Not really. But I don't know if the patches I landed will actually fix the crash or not. If the STR are reliable can somebody from QA test a build with these patches to see if it fixes the problem?
Flags: needinfo?(bugmail.mozilla)
Issue appears to occur reliably if user scrolls when loading any app that has ability to horizontally or vertically scroll (e.g. Calender, Gallery, Homescreen [from lockscreen])
Whoops, I probably should have marked this as no_uplift until we verified the fix. Ah well the uplift shouldn't hurt. But we'll still need to verify the fix.
We just confirmed that the patches here didn't fix the crash, so I'm reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Kats, To make sure this blocker is on your radar I am ni you. As you know this is a QC blocker and they are expecting a fix by 1/10/14.
Flags: needinfo?(bugmail.mozilla)
Jason, did you try backing out bug 950489 to see if it fixes the crash?
Flags: needinfo?(bugmail.mozilla) → needinfo?(jsmith)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #25) > Jason, did you try backing out bug 950489 to see if it fixes the crash? Not yet.
Flags: needinfo?(jsmith)
Interesting to note that this crash has actually been around for a long time. See for instance https://crash-stats.mozilla.com/report/index/4535287d-a194-4524-a005-649e32131215 which occurred on B2G-18 build from 20130621133927. The frequency just spiked way up on Dec 19, probably from the landing of bug 950489. Backing that out will probably reduce the frequency back to old levels but won't fix it completely.
Crash Signature: [@ mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)] → [@ mozilla::dom::TabChild::UpdateTapState(mozilla::WidgetTouchEvent const&, nsEventStatus)] [@ mozilla::dom::TabChild::UpdateTapState ]
Also, this crash should only happen when APZC is disabled, and as of last night APZC is enabled by default so this crash shouldn't get very often. Nonetheless I will spend some time today to see if I can figure it out.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31) > Also, this crash should only happen when APZC is disabled, and as of last > night APZC is enabled by default so this crash shouldn't get very often. > Nonetheless I will spend some time today to see if I can figure it out. Per talking w/Marcia - she mentioned on a build with APZC enabled, the crash rate is definitely lower. However, it's still possible to hit in Gaia. She can add more details on the reproduction steps.
Here are the steps from my crash today: 1. Incoming call log, select a new number and "Create a new contact" 2. Contacts app: Start typing a name in the first name field, and then backspace a few times using the delete key on the keyboard
No longer a smoketest blocker with turning APZC on.
Keywords: smoketest
Attached patch bug952170.patch (obsolete) — Splinter Review
I put in a patch just to null check and see what was going on. without APZ turned on, I got an error : 01-09 17:09:32.759: E/GeckoConsole(1653): [JavaScript Error: "NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIContentFrameMessageManager.content]" {file: "chrome://global/content/BrowserElementChildPreload.js" line: 737}] And I did not crash. Having said that I don't think I fixed it, something else must have fixed it. We should probably check in tomorrow's build if we crash or not without APZ turned on. If we still do, we can try creating a new build and seeing if someone else crashes with this patch? Gaia a5288df8de82cd84be14391c96f99a140535a78e Gecko 78ebe7f47e81d6ffbaff3ff12f32597aca457205 BuildID 20140109155817 Version 29.0a1 ro.build.version.incremental=eng.archermind.20131114.105818 ro.build.date=Thu Nov 14 10:58:33 CST 2013
Apparently, this still crashed even with APZ turned on according to the smoke testers.
Kats, Please prioritize this blocker. it blocks QC
Flags: needinfo?(bugmail.mozilla)
I have not yet been able to reproduce this. I'll keep trying.
Flags: needinfo?(bugmail.mozilla)
Comment on attachment 8358119 [details] [diff] [review] bug952170.patch Review of attachment 8358119 [details] [diff] [review]: ----------------------------------------------------------------- I assume that since you didn't mention it, you never saw the "touch is null" output get hit, which means that the null check here never triggered at all. So I don't think this patch will have much effect. Also according to the crashreports the crashes were a SIGBUS rather than a SIGSEGV so I don't think it's a null pointer dereference that's causing the problem.
Attachment #8358119 - Flags: review?(bugmail.mozilla) → review-
(In reply to Marcia Knous [:marcia - use needinfo] from comment #33) > Here are the steps from my crash today: > > 1. Incoming call log, select a new number and "Create a new contact" > 2. Contacts app: Start typing a name in the first name field, and then > backspace a few times using the delete key on the keyboard Another way I was able to reproduce it three times in a row using today's build involved installing the XE Currency app and then trying to add a currency by Country - typing a few characters in the text field caused the crash to happen three times: https://crash-stats.mozilla.com/report/index/58d37b4e-53c2-40a2-9b89-9e7d82140113
Caveat: Slightly older build and on Peak https://crash-stats.mozilla.com/report/index/fd0896c7-ec57-4f67-b35e-7de842140113 STR: 1. Tap settings icon to load app. 2. Immediately tap the screen again before the app loads fully.
I think I can reproduce it now, using the STR in comment 41 along with a call to nsDOMWindowUtils::GarbageCollectNow() just before the UpdateTapState call at http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabChild.cpp?rev=49c6393d37f3#1875 Investigating...
Attached patch Patch (obsolete) — Splinter Review
Well that was a bit anticlimactic. DispatchWidgetEvent was removing all the touch points from the event, so when we did event.touches[0] we got back a pointer that had a 0x1 value. Dereferencing that caused the SIGBUS. The GarbageCollect() call I mentioned above seems unrelated except that maybe it slows down things a bit and widens the window in which I could trigger the bug.
Attachment #8359361 - Flags: review?(bugs)
I am hitting this crash with a 100% repro rate by replying to an email in the Email app and typing in the body field. The keyboard disappears after a few characters are typed. When this process of selecting the body and attempting to type is repeated 1 to 4 more times this crash is experienced. Environmental Variables Device: Buri v 1.4 Mozilla RIL Build ID: 20140113040202 Gecko: http://hg.mozilla.org/mozilla-central/rev/12d3ba62a599 Gaia: fd3b9a97cb3c41cfa56be387b46a51db136b4422 Platform Version: 29.0a1 Firmware Version: V1.2-device.cfg
Comment on attachment 8359361 [details] [diff] [review] Patch I think we should just null check, since if we remove stuff from the touch list, we didn't dispatch events for the touch.
Attachment #8359361 - Flags: review?(bugs) → review-
Attached patch Patch v2Splinter Review
Ok, that's even easier then.
Attachment #8359361 - Attachment is obsolete: true
Attachment #8359400 - Flags: review?(bugs)
Attachment #8359400 - Flags: review?(bugs) → review+
(In reply to Martijn Wargers [:mwargers] (QA) from comment #47) > I get this crash after a while, while running: > http://people.mozilla.org/~mwargers/tests/contenteditable/ > contenteditable_focus_blur_parentframe.htm Presumably this is without the patch?
Status: REOPENED → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Attachment #8358119 - Attachment is obsolete: true
Attachment #8359400 - Flags: checkin+
Whiteboard: [b2g-crash] → [b2g-crash] [CR 593145]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: