phone becomes sluggish and drains battery due to infinite loop of messages within DOM Settings code

RESOLVED WORKSFORME

Status

()

Core
DOM: Device Interfaces
RESOLVED WORKSFORME
3 years ago
3 years ago

People

(Reporter: dbaron, Unassigned)

Tracking

Trunk
ARM
Gonk (Firefox OS)
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

3 years ago
For a few weeks to a few months, I've been seeing a bug on my Flame (for which I make my own userdebug builds) where the phone would become very sluggish and drain battery quickly.  For example, sometimes I'd click a button to place a phone call, and the call would be placed 30 seconds later.  Or just general slowness in response to user input.

(Some of my previous attempts to debug it were thwarted by the setings API seeming broken; switching from user to userdebug builds has probably helped that.)

It takes a few days of use for the problem to show up, and I have no idea what the steps to reproduce are.

Running 'top -t' when this happens shows that the parent b2g process main thread is using 49% CPU.

I currently have the problem in gdb for the first time in a useful way.

A big piece of what seems to be happening is alternation between processing a "Settings:Get:OK" message and processing a "Settings:Finalize:OK" message.  A bunch of the stuff that connects this apparently-infinite loop appears to involve promises.  I'm still poking around to figure out what that connection is.  (Note that it's also hard to get JS stacks; I haven't yet tried, since I'm worried about crashing the phone and being unable to debug further.)


Also note that the revision I'm currently using does have https://hg.mozilla.org/mozilla-central/rev/69866036d793 , so that isn't the fix.
(Reporter)

Comment 1

3 years ago
Created attachment 8503863 [details]
complete debugging session

... which ended when the phone gave up and b2g restarted.  So there's nothing left to debug right now.
This is most likely Bug 1064718. Which unfortunately had no diagnosis, so it's hard to say what there is to dupe and this bug already has more info attached. ni?'ing Alex.
Flags: needinfo?(lissyx+mozillians)
Maybe you are exposing bug 1072155 ?
Flags: needinfo?(lissyx+mozillians)
Well it's been quite some time that I'm suspecting that bug 1072155 and bug 1064718 are just dupe and a manifestation of the very same bug. I'm having more troubles reproducing those these days, but considering I have been running my device with the fix for bug 1076597 at least two weeks and I still saw both (or the same?), I completely agree with david that we still have something bogus.

David, did you noticed some spamming of "geo: Debug: GPS ignored 0" in logcat? You may need to have geolocation debug  switched in Settings.
Flags: needinfo?(dbaron)
(Reporter)

Comment 5

3 years ago
I didn't actually look at logcat when this was happening.  (Or are you asking if I see it in general, even when this isn't happening?)
Flags: needinfo?(dbaron)
(Reporter)

Comment 6

3 years ago
Created attachment 8510807 [details]
another gdb session

Spent too much of the time while gdb was working reading code, and then it crashed.

(Also, I had no network since I was relying on tethering to my phone for network.)

I wonder if the geolocation.debugging.enabled string within that chunk of memory is relevant.
You could just change the "const DEBUG = false;" lines to true in SettingsRequestManager.jsm and SettingsManager.js. Logcat would be pretty noisy but if it started happening you wouldn't be forced to gdb it I think.
(Reporter)

Comment 8

3 years ago
Saw this again today and had to reboot the phone without debugging; I'm starting to think it may be related to geolocation.
(In reply to David Baron [:dbaron] (UTC-4) (needinfo? for questions) (away/busy Oct. 24-31) from comment #8)
> Saw this again today and had to reboot the phone without debugging; I'm
> starting to think it may be related to geolocation.

Looks like I saw it also. Maybe playing with FxStumbler helps.
Created attachment 8511532 [details]
logcat after 2 minutes of FxStumbler, turning off geolocation
Created attachment 8511533 [details]
logcat after 2 minutes of FxStumbler, turning on geolocation again
STR:
 0. Reboot
 1. Record "adb logcat -v threadtime | egrep Debug"
 2. Start HERE Maps, let it run for a couple of seconds, kill
 3. Turn off then on Geolocation in Settings app

Expected:
 Something better ...

Actual:
 It looks like as soon as I repeat step 2 and 3 multiple times, the number of outputted lines for step 1 ("GeckoConsole: geo: Debug: GPS ignored 0") grows. For example, 6 lines on a first launch/kill of HERE Maps and re-enabling geolocation, and more than 20 lines after doing it twice.

Given that "GeckoConsole: geo: Debug: GPS ignored 0" is triggered by a geolocation.debugging.enabled mozsettings-change event, I fear something bad happens.
Flags: needinfo?(dbaron)
So in GonkGPSGeolocationProvider, a listener for settings changes is added at: http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/GonkGPSGeolocationProvider.cpp#823

This will be handled by the Observe() method at http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/GonkGPSGeolocationProvider.cpp#993 and that code does a request to a local RequestSettingValue() method.

This will be handled at http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/GonkGPSGeolocationProvider.cpp#395:
 - we get the settings service
 - we create a new lock, and request the value

This means that on each observed geolocation setting change triggered, we will create two new locks to read those values, when the observer already has the proper value when the observer gets triggered.

I'm not sure yet if this could explain the issue we are tracking there, but this looks much more than suboptimal already.
Blocks: 1089619
(Reporter)

Updated

3 years ago
No longer blocks: 1089619
Depends on: 1089619
Flags: needinfo?(dbaron)
Closing since no more manifestation since bug 1089619 landed.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.