Closed Bug 1178451 Opened 9 years ago Closed 9 years ago

[Dialer] Spamming keypad inputs during a call can cause the device to become unresponsive.

Categories

(Firefox OS Graveyard :: Gaia::Dialer, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.5+, b2g-v2.0 unaffected, b2g-v2.1 affected, b2g-v2.2 affected, b2g-master affected)

RESOLVED FIXED
blocking-b2g 2.5+
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- affected
b2g-v2.2 --- affected
b2g-master --- affected

People

(Reporter: NicholasN, Assigned: gsvelto)

References

()

Details

(Keywords: regression, Whiteboard: [3.0-Daily-Testing][Spark])

Attachments

(4 files)

Attached file logcat_dialer.txt
Description:
Two devices are in a call, and device A brings up the keypad. For 1.5 to 3 minutes, device A spams keypad inputs and then device B hangs up. Device A stays on the call screen and is temporarily unresponsive. On Aries the device eventually recovers, depending on how long inputs were spammed. On flame the device eventually rebooted after hanging. Although this user scenario is uncommon, the result is significantly bad.


Repro Steps:
1) Update a Aries to 20150629130633
2) Open dialer and call another device.
3) Bring up the key pad and rapidly spam inputs.
4) After 1.5-3 minutes, hang up on the second device.
5) Observe device A.


Actual:
Device hangs after spamming inputs during call.


Expected:
Device continues to function appropriately after spamming inputs.


Notes:

Environmental Variables:
Device: Aries 2.5
Build ID: 20150629130633
Gaia: 0b166043ef2a1f235a4d7d4f40a51b625784195a
Gecko: e137fc38c431
Gonk: 2916e2368074b5383c80bf5a0fba3fc83ba310bd
Version: 41.0a1 (2.5)
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0


Repro frequency: 4/5
See attached: video clip, logcat
This issue also reproduced on Flame 2.5 and Flame 2.2, but not Flame 2.1 or Flame 2.0

Flame 2.0 (did not occur)

Actual Result:

After spamming numbers the device did not slow down or hang.

Environmental Variables:
Device: Flame 2.0
BuildID: 20150629000206
Gaia: 5552bf529d3d6775a968942e9afa6c1d4037362c
Gecko: b11263ce4493
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 32.0 (2.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Flame 2.1 (did not occur)

Actual Result:

After spamming numbers the device did not slow down or hang.

Environmental Variables:
Device: Flame 2.1
BuildID: 20150628001204
Gaia: f8b848c82d1ed589f7a1eb5cc099830c867ff1d4
Gecko: d3e432c4546a
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 34.0 (2.1) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Flame 2.2 (Occurs)

Actual Result:

Spamming numbers during a call caused the phone to hang and eventually reboot automatically.


Environmental Variables:
Device: Flame 2.2
BuildID: 20150626162505
Gaia: 0179935627012dfde3ca036c9a71035be463b7ad
Gecko: 330f52ef6a2d
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0


Flame 2.5 (Occurs)

Actual Result:

Spamming numbers during a call caused the phone to hang and eventually reboot automatically.


Environmental Variables:
Device: Flame 2.5
BuildID: 20150629010206
Gaia: 8a1e4ae522c121c5cacd39b20a5386ec9055db82
Gecko: eaf4f9b45117
Gonk: a4f6f31d1fe213ac935ca8ede7d05e47324101a4
Version: 41.0a1 (2.5) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Keywords: regression
Whiteboard: [3.0-Daily-Testing][Spark]
[Blocking Requested - why for this release]: Regression that happened after 2.1.
blocking-b2g: --- → 2.5?
Comms triage: Regression
blocking-b2g: 2.5? → 2.5+
Requesting a window.
Flags: needinfo?(pbylenga)
QA Contact: jmercado
This issue actually is affected on 2.1, but I was unable to reproduce this on 2.0.

Environmental Variables:
Device: Flame 2.1
BuildID: 20150629100738
Gaia: 7080a7c28b0242f81d689d2339dfa1177e23f48f
Gecko: c201e76c63d1
Version: 34.0 (2.1) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Environmental Variables:
Device: Flame 2.0
BuildID: 20150629192043
Gaia: 5552bf529d3d6775a968942e9afa6c1d4037362c
Gecko: 88eedcc6fd9b
Version: 32.0 (2.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
I was able to reproduce this issue in Central all the way back to the earliest Flame build we have available.  This is a 2.0 build which means that it was likely fixed at some point in the 2.0 Branch but we don't have access to builds from JB Flame 2.0 branch anymore so I can't continue this window.

Environmental Variables:
Device: Flame 2.0
BuildID: 20140506163010
Gaia: 98ca8c55dbe2f21a8661d0eaa87f34d316c3bc98
Gecko: 4e4e0f502969
Version: 32.0a1 (2.0) 
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Assignee: nobody → drs
Identified as paper cut. P3
Priority: -- → P3
Status: NEW → ASSIGNED
Hi Doug, any progress here?
Flags: needinfo?(drs)
I've reproduced this on the emulator and grabbed a profile while this happens, you can find it here:

http://people.mozilla.org/~bgirard/cleopatra/#report=3cffd416ad1238cd1beedc6e465230f1dcd953e6&filter=[{%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A391743,%22end%22%3A396986}]&selection=0,1,24,25,26,27,28,29,30,31,36,37

The problem is completely different than what I originally thought: every time the user taps a key we try to adapt the font of the text so that it would fit within the dialer display without ellipsis. Past a certain point this is obviously impossible as we've got a minimal font size and then we start showing an ellipsis. The issue is that as the user keeps typing we keep trying to measure the text to see if we can find a "good" font size to display it entirely and this becomes more and more expensive as the string length becomes longer. In the end we spend 100s of ms within CanvasRenderingContext2D.measureText() and the phone becomes unresponsive because of this; this is made worse by the fact that the callscreen now lives in the system app (facepalm) and thus starves the main app's main thread which is busy computing font sizes all the time (double-facepalm).

The offending line is here:

https://github.com/mozilla-b2g/gaia/blob/9d4b4b23eb73ac0bd4ad65ce14390c9e94087235/shared/js/font_size_utils.js#L167

Solving this won't be easy; one way to do this is to cache the last string we've measured and if we've already hit the smallest font size and a longer string is submitted which starts with the cached one then we can return the smallest font size directly without measuring it. This is going to be a little tricky though as we need to ensure that everything else stayed the same between calls, besides this is shared code so changing it might break stuff elsewhere so we need to be extra-cautious.

Alternatively we can implement this fix in the dialer code: past a certain point stop requesting new font sizes as you already know it's not going to yield a different one.
Slight change in my analysis. The FontSizeUtils is calling CanvasRenderingContext2D.measureText() in a loopm within the getOverflowCount() function always starting from 0 characters which means that function has an O(n^2) cost, ouch. The fix is trivial though, we can cache the last overflow count and start from there, discarding the result and starting over if we find it's the wrong starting value.
Stealing the bug.
Assignee: drs → gsvelto
Flags: needinfo?(drs)
Comment on attachment 8676303 [details] [review]
[gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master

This PR tries to reuse the last overflow count computation instead of starting over each time. This works well for repeated calls with the same string prefix, e.g.: "a", "ab", "abc", "abcd", etc... In other scenarios if the first computation doesn't yield a correct result we fall back to the original behavior.

Michael can you review this? It's shared code and you wrote most of it so I think you're the best candidate to review my change.
Attachment #8676303 - Flags: review?(mhenretty)
Thanks for the patch Gabriele, I'll take a look at this tomorrow. I'm wondering if the gaia font-fit web component [1] has the same problem, or if not if we should just switch over to using that. I'll have more time to deep dive tomorrow.

https://github.com/gaia-components/font-fit/blob/08114e2/font-fit.js#L57
In font-fit we're caching the canvas context, but not the result itself (I tried it in the past and it didn't bring improvements). Used in gaia-header it doesn't have the same behavior either as we're throttling the font-fit call.
Comment on attachment 8676303 [details] [review]
[gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master

Nice fix! I left some comments on github. Also, let's add a unit test or two for this optimization [1].

1.) https://github.com/mozilla-b2g/gaia/blob/47dd6cf/apps/sharedtest/test/unit/font_size_utils_test.js
Attachment #8676303 - Flags: review?(mhenretty)
I'd like to use the gaia font-fit component in the dialer but such a big change is too risky so close to the 2.5 branching date. I'll address your comments and post an updated PR soon, thanks!
Comment on attachment 8676303 [details] [review]
[gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master

Pushed a new patch with a unit-test added and a comment. I didn't factorize those two lines into a function because we'd need two outputs: the result width and the substring so they're not really a good candidate for putting them in a single function.
Attachment #8676303 - Flags: review?(mhenretty)
Comment on attachment 8676303 [details] [review]
[gaia] gabrielesvelto:bug-1178451-cache-last-overflow-count-computation > mozilla-b2g:master

Thanks!
Attachment #8676303 - Flags: review?(mhenretty) → review+
Thanks for reviewing this, merged to gaia/master 41b23d23f0ad911f3e7763b9fff495c6a8bcfedc

https://github.com/mozilla-b2g/gaia/commit/41b23d23f0ad911f3e7763b9fff495c6a8bcfedc
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Hi Gabriele,

    Could you please take a look at the results below again? Please note that user need input more spamming in the call, then device hangs.
------------------------------------------------------------------------------------------------
This bug has been verified as "Fail" on the latest build of Flame KK v2.5 and Aries KK v2.5 by the STR in comment 0.

Actual results: Spam inputting many numbers during a call can cause the device to become unresponsive.
See attachments: Verified_logcat_1551.txt Verified_Aries_v2.5.3gp
Reproduce rate: 10/10

Device: Aries KK 2.5 (Fail)
Build ID               20151026223015
Gaia Revision          b564b21e08ffc3e4962f08850843c7482932ee7b
Gaia Date              2015-10-26 14:22:21
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/6c7c983bce46a460c2766fbdd73283f6d2b03a69
Gecko Version          44.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20151026.214940
Firmware Date          Mon Oct 26 21:49:48 UTC 2015
Bootloader             s1

Device: Flame KK 2.5 (Fail)
Build ID               20151026030217
Gaia Revision          a677ddd3aa3a81058775938bd56008d96dbc78b0
Gaia Date              2015-10-26 04:48:31
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/5ca03a00d26823ce91ee0eaa2937bed605bd53c1
Gecko Version          44.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20151026.070911
Firmware Date          Mon Oct 26 07:09:23 EDT 2015
Firmware Version       v18D v4 
Bootloader             L1TC000118D0
Attached video Verified_Aries_v2.5.3gp
Flags: needinfo?(gsvelto)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][MGSEI-Triage+]
Thanks for checking again Sharon, I've tried spamming the keypad on my device for 90s but I can't seem to hit the issue anymore. I've noticed another issue though it seems that we're "piling up" the audible tones for each key and we keep playing them long after the user has finished typing I'll file a separate bug for that. I'm now looking at the RIL log, I suspect the issue might not lie in the dialer.
I finally found out what the problem is, though it's different than what I initially thought. The issue isn't caused by simply spamming the keyboard, it just doesn't manifest if you do it with only one finger, it's caused by multi-tap. When typing really fast we can often hit two keys at the same time and that triggers an area of the code that's known to cause problems. I'll open a separate bug for that and fix it there.
Flags: needinfo?(gsvelto)
Blocks: 1219249
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: