Closed
Bug 1146987
Opened 8 years ago
Closed 8 years ago
Crash in Messaging app while stability testing
Categories
(Core Graveyard :: Widget: Gonk, defect)
Tracking
(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.1 unaffected, b2g-v2.2 fixed, b2g-master fixed)
People
(Reporter: ggrisco, Assigned: kats)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 588][caf priority: p1][CR 812377])
Crash Data
Attachments
(18 files, 1 obsolete file)
143.65 KB,
text/plain
|
Details | |
449.71 KB,
text/plain
|
Details | |
140.91 KB,
text/plain
|
Details | |
463.53 KB,
text/plain
|
Details | |
143.62 KB,
text/plain
|
Details | |
483.84 KB,
text/plain
|
Details | |
138.29 KB,
text/plain
|
Details | |
485.62 KB,
text/plain
|
Details | |
3.25 KB,
patch
|
Details | Diff | Splinter Review | |
103.19 KB,
text/plain
|
Details | |
376.01 KB,
text/plain
|
Details | |
102.57 KB,
text/plain
|
Details | |
393.48 KB,
text/plain
|
Details | |
1.68 KB,
patch
|
mchang
:
review+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
127.83 KB,
text/plain
|
Details | |
500.60 KB,
text/plain
|
Details | |
99.52 KB,
text/plain
|
Details | |
484.91 KB,
text/plain
|
Details |
Test steps: 1. Goto Messages app 2. Type the message and the number to which it has to be sent. 3. Long press the send message icon 4. You will find message just crashed pop-up. I'm not sure how repeatable this is. We've seen it twice so far on latest build. Cafbot will upload logs.
Comment 1•8 years ago
|
||
We especially need the stack trace...
Updated•8 years ago
|
Whiteboard: [CR 812377] → [caf priority: p1][CR 812377]
Updated•8 years ago
|
Whiteboard: [caf priority: p1][CR 812377] → [b2g-crash][caf-crash 588][caf priority: p1][CR 812377]
Comment 2•8 years ago
|
||
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110 Moz BuildID: 20150322002503 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.110.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=44c62060581fde8de1e12e94cf55e9673b401a47 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=c36e800ae5e5e7e790129e18edaef202d8d3dbff Patches: bug 1133147, bug 1139469, bug 1053634, bug 1142770
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Hey Kats, does this make some sense to you?
Flags: needinfo?(bugmail.mozilla)
Comment 6•8 years ago
|
||
Comms Triage: We would like to know if this crash can be reproduced on a Flame before making a call on the blocker status.
Assignee | ||
Comment 7•8 years ago
|
||
From the stack trace and crash address it looks like a vsync is happening and we are trying to dispatch move events after the GeckTouchDispatcher has been freed, maybe. Definitely a platform bug anyway, so moving components. I can look into it but off the top of my head I'm not how we could run into this.
Assignee: nobody → bugmail.mozilla
Component: Gaia::SMS → Widget: Gonk
Flags: needinfo?(bugmail.mozilla)
Product: Firefox OS → Core
Version: unspecified → 37 Branch
Comment 8•8 years ago
|
||
What device is this on? It seems odd from the stack trace that this is using the SoftwareDisplay rather than the GonkDisplay, which means we fell back to software vsync somehow. Is this on an L device?
Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Mason Chang [:mchang] from comment #8) > software vsync somehow. Is this on an L device? This is happening on L-based 8909.
I couldn't crash on the Nexus L; it has a different chipset. I think it depends on the chipset? I can let mason borrow my Nexus L; though if we can't crash with it, we'll need a device that has that chipset. Otherwise we (Mozilla Dev/QA)'re blocked completely from looking at this issue.
Comment 11•8 years ago
|
||
Comment 12•8 years ago
|
||
Reporter | ||
Comment 13•8 years ago
|
||
Test team found the same crash by downloading "Sprint Club Nitro" from Marketplace and playing the game. I just had cafbot attach the logs for it. Hope that helps.
Comment 14•8 years ago
|
||
Does this still occur without the custom patches in comment 2?
Assignee | ||
Comment 15•8 years ago
|
||
So the line at [1] is the one that's supposedly crashing. That this line is running at all means mResamplingEnabled is false, because of the check on line 174. But if mResamplingEnabled is false and we survived the early return on line 95, that means gfxPrefs::HardwareVsyncEnabled() must be false. So how are we getting vsync notifications? Something here doesn't add up.... http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=widget/gonk/GeckoTouchDispatcher.cpp;h=2146c41ad52bd31490af9050b632896726c8cd65;hb=c36e800ae5e5e7e790129e18edaef202d8d3dbff#l193
Reporter | ||
Comment 16•8 years ago
|
||
(In reply to Mason Chang [:mchang] from comment #14) > Does this still occur without the custom patches in comment 2? It was observed on the same exact build, so yes.
Comment 17•8 years ago
|
||
What's also confusing is that the stack trace says we're calling DispatchTouchMoves from the display object. We should be calling it from the Compositor thread - http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=gfx/layers/ipc/CompositorParent.cpp;h=f5f664f660a5653debb12994606dc62c67922609;hb=c36e800ae5e5e7e790129e18edaef202d8d3dbff#l340
Assignee | ||
Comment 18•8 years ago
|
||
I think the SoftwareDisplay symbol is bogus. I've seen issues before where the symbol names for tasks aren't correct. I think what's happening is that is resampling is actually enabled, and we're going through the path at http://mxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?rev=49bbfe7887d5#132
Assignee | ||
Comment 19•8 years ago
|
||
err I meant resampling is actually disabled.
Assignee | ||
Comment 20•8 years ago
|
||
Can you guys run with this patch and post logs if you hit it again? Also you should be able to unapply the "MONKEY" logging patch from bug 1139469 since we already fixed that. I see output from that patch in both of the logs posted to this bug.
Comment 22•8 years ago
|
||
The logging patch was added to our builds. cafbot should comment when monkey crashes again.
Flags: needinfo?(ntroast)
Comment 23•8 years ago
|
||
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.113 Moz BuildID: 20150323002504 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.113.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7f367fc98ffdd183f21d2cdfe20556ab877ece34 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=3970b1e517b6447237421c1f9e7845c00bde0838 Patches: bug 1133147, bug 1139469, bug 1145724, bug 1142770, bug 1146987
Comment 24•8 years ago
|
||
Comment 25•8 years ago
|
||
Assignee | ||
Comment 26•8 years ago
|
||
This latest dump doesn't have any of the new logging I added, and it still has the old compositor loop logging which I asked to be removed.
Reporter | ||
Comment 27•8 years ago
|
||
The change didn't make it in time to be built in AU113. It should come in next AU though.
Flags: needinfo?(ggrisco)
Comment 28•8 years ago
|
||
Sorry about that it should be included in AU 114+
Comment 29•8 years ago
|
||
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.114 Moz BuildID: 20150323002504 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.114.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7f367fc98ffdd183f21d2cdfe20556ab877ece34 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=3970b1e517b6447237421c1f9e7845c00bde0838 Patches: bug 1143694, bug 1146987, bug 1145724, bug 1133147, bug 1139469, bug 1142770
Comment 30•8 years ago
|
||
Comment 31•8 years ago
|
||
Assignee | ||
Comment 32•8 years ago
|
||
This log does have my new logging output, and it indicates that resampling is disabled and confirms what I said in comment 18-19. However it refutes my hypothesis that the GeckoTouchDispatcher was getting destroyed. So I'm not sure what the problem here is, will need to come up with a new hypothesis.
Assignee | ||
Comment 33•8 years ago
|
||
My new hypothesis is that we're getting bad input into the GeckoTouchDispatcher. I think this can happen if we get a end/cancel event with one touch point, followed immediately by a move event (which is bad input). In this scenario, the end/cancel event will go through [1] on the libui thread, and then await processing on the controller thread. Let's say that while the task is awaiting processing, we get the move event, which puts the input into the mTouchMoveEvents at [2] and also dispatches a task to the controller thread a few lines down. Now the first task (end/cancel event) runs, which clears out the mTouchMoveEvents at [3]. Then the move task runs which tries to grab an item from mTouchMoveEvents (which is empty) at [4]. I'll add more logging to see if I can confirm this hypothesis. [1] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?rev=49bbfe7887d5#139 [2] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?rev=49bbfe7887d5#126 [3] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?rev=49bbfe7887d5#329 [4] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?rev=49bbfe7887d5#195
Assignee | ||
Comment 34•8 years ago
|
||
Updated logging patch to test hypothesis in previous comment. Please apply this in place of the previous one. Thanks!
Attachment #8583265 -
Attachment is obsolete: true
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(ntroast)
Comment 35•8 years ago
|
||
The new patch was uploaded by ggrisco. It may not make it into the next AU, but it should be coming.
Flags: needinfo?(ntroast)
is bug 1148636 related?
Assignee | ||
Comment 37•8 years ago
|
||
It's probably a dup but it might not be. Making a dependency for now.
Blocks: 1148636
Comment 38•8 years ago
|
||
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.115 Moz BuildID: 20150323002504 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.115.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7f367fc98ffdd183f21d2cdfe20556ab877ece34 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=3970b1e517b6447237421c1f9e7845c00bde0838 Patches: bug 1133398, bug 1143694, bug 1146987, bug 1145724, bug 1147646, bug 1133147, bug 1139469, bug 1142770
Comment 39•8 years ago
|
||
Comment 40•8 years ago
|
||
Reporter | ||
Comment 41•8 years ago
|
||
(In reply to Nicholas Troast [:ntroast] from comment #35) > The new patch was uploaded by ggrisco. It may not make it into the next AU, > but it should be coming. AU 115 still does not have the latest patch. Should come in next AU.
Comment 42•8 years ago
|
||
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.116 Moz BuildID: 20150323002504 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.116.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7f367fc98ffdd183f21d2cdfe20556ab877ece34 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=3970b1e517b6447237421c1f9e7845c00bde0838 Patches: bug 1133398, bug 1143694, bug 1146987, bug 1145724, bug 1147646, bug 1133147, bug 1139469, bug 1142770
Comment 43•8 years ago
|
||
Comment 44•8 years ago
|
||
Assignee | ||
Comment 45•8 years ago
|
||
Ok, so what's happening here is that this sequence of inputs: 1) touchdown 2) touchup 3) touchdown 4) touchmove (possibly multiple) arrives on the libui thread all at once, and then the stuff posted to the controller thread gets to run. When the DispatchTouchNonMoveEvent for (2) gets run, it clears out the move event queue, which includes the events from (4). Therefore when DispatchTouchMoveEvents for (4) runs the queue is empty and it craps out. We already have code to deal with interleaved move and non-move events for the resampling case, we just need to use it for the non-resampling case as well.
Assignee | ||
Comment 46•8 years ago
|
||
This should fix it. Can you apply this patch and see if you still have the same crash?
Flags: needinfo?(ntroast)
Attachment #8585440 -
Flags: review?(mchang)
Comment 47•8 years ago
|
||
Comment on attachment 8585440 [details] [diff] [review] Fix Review of attachment 8585440 [details] [diff] [review]: ----------------------------------------------------------------- nit: Can we please add an assert that mTouchMoveEvents.IsEmpty() here - https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp?from=GeckoTouchDispatcher.cpp&case=true#172
Attachment #8585440 -
Flags: review?(mchang) → review+
Assignee | ||
Comment 48•8 years ago
|
||
Added assert as requested, landed: https://hg.mozilla.org/integration/b2g-inbound/rev/8d9394e77702
https://hg.mozilla.org/mozilla-central/rev/8d9394e77702
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Updated•8 years ago
|
blocking-b2g: 2.2? → 2.2+
Comment 50•8 years ago
|
||
Thanks for the patch ! Just a reminder that we'll need an approval, Kats :)
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 51•8 years ago
|
||
Yeah I'll do that once we have verification it fixes the issue and the regressions are taken care of.
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 53•8 years ago
|
||
Yes it should. Fixing, but leaving ni to wes as an FYI.
Target Milestone: mozilla39 → mozilla40
Comment 54•8 years ago
|
||
I was going to wait for this patch to land, but since you are waiting for our test results to approve then I will go ahead and add this patch to our patch tree.
Flags: needinfo?(ntroast)
Assignee | ||
Comment 55•8 years ago
|
||
Thanks. Please also add the patch from bug 1149412 on top (or just take out the assert from this patch).
Ugh, mcMerge hadn't picked up the version bump when I ran that, and I forgot to manually change them.
Flags: needinfo?(wkocher)
Comment 57•8 years ago
|
||
ignore |
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.117 Moz BuildID: 20150323002504 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.117.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=7f367fc98ffdd183f21d2cdfe20556ab877ece34 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=3970b1e517b6447237421c1f9e7845c00bde0838 Patches: bug 1133398, bug 1143694, bug 1146987, bug 1145724, bug 1147646, bug 1133147, bug 1139469, bug 1142770
Comment 58•8 years ago
|
||
ignore |
Comment 59•8 years ago
|
||
ignore |
Comment 60•8 years ago
|
||
The above AU does not include the fix. It may not come in until AU 119. Also, when you are ready to approve the patch then please also approve the patch from bug 1149412 for v2.2
Comment 61•8 years ago
|
||
ignore |
Observed on: Device: msm8909 Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.118 Moz BuildID: 20150330002503 Manifest: https://www.codeaurora.org/cgit/quic/lf/b2g/manifest/tree/caf_AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.118.xml?h=release Gecko Version: 37.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=473cd63f53c855299b719285d9b95e3f2910782f Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e44e28d2a37258fe0c908c59f2e91e5287777b40 Patches: bug 1133398, bug 1143694, bug 1146987, bug 1145724, bug 1147646, bug 1133147, bug 1142770
Comment 62•8 years ago
|
||
ignore |
Comment 63•8 years ago
|
||
Attachment #8585440 -
Flags: approval-mozilla-b2g37?
Assignee | ||
Comment 64•8 years ago
|
||
Comment on attachment 8585440 [details] [diff] [review] Fix Generally the patch author requests uplift approval as a risk assessment of the uplift needs to be made. I am still waiting for some reasonable confirmation that this fixes the issue reported (in this case that a build with the patch has been running for a few days without crashing).
Flags: needinfo?(bbajaj)
Attachment #8585440 -
Flags: approval-mozilla-b2g37?
Reporter | ||
Comment 65•8 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #64) > Comment on attachment 8585440 [details] [diff] [review] > Fix > > Generally the patch author requests uplift approval as a risk assessment of > the uplift needs to be made. I am still waiting for some reasonable > confirmation that this fixes the issue reported (in this case that a build > with the patch has been running for a few days without crashing). Thanks for the explanation :) Should we ni? the patch author with the information when it's available? I'm afraid we delay the uplift since being aware of non-crashes is a little more hard to track from our side. For this bug we're fairly certain that it is fixed since we have a considerable amount of time testing on AU119 without having seen it again.
Assignee | ||
Comment 66•8 years ago
|
||
(In reply to Greg Grisco from comment #65) > For this bug we're fairly certain that it is fixed since we have a > considerable amount of time testing on AU119 without having seen it again. That's all I needed to hear, thanks! :)
Assignee | ||
Comment 67•8 years ago
|
||
Comment on attachment 8585440 [details] [diff] [review] Fix NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): touch input resampling User impact if declined: possible crashes when touch resampling is disabled (requires flipping a pref to non-default value) Testing completed: see previous comment Risk to taking this patch (and alternatives if risky): low risk, but we should take bug 1149412 also. String or UUID changes made by this patch: none
Attachment #8585440 -
Flags: approval-mozilla-b2g37?
Assignee | ||
Updated•8 years ago
|
Updated•8 years ago
|
Attachment #8585440 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•5 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•