Closed
Bug 867360
Opened 12 years ago
Closed 12 years ago
Watcher dies sometimes, because it is not running as a foreground service
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla24
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
Attachments
(5 files, 4 obsolete files)
|
27.37 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
|
4.94 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
|
1.39 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
|
2.16 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
|
2.66 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
In some robocop failure logs from Pandaboards, we see the Watcher die for no apparent reason. This may be unrelated to robocop -- robocop tests will be particularly sensitive to the keyguard, even if it is only displayed for a short time (it looks like the watcher is / would be restarted in 5 seconds).
One example (see logs in blocked bugs for others):
https://tbpl.mozilla.org/php/getParsedLog.php?id=22418706&tree=Mozilla-Inbound
04-30 09:48:11.492 I/ActivityManager( 1401): No longer want com.mozilla.watcher (pid 1884): hidden #16
04-30 09:48:11.492 W/ActivityManager( 1401): Scheduling restart of crashed service com.mozilla.watcher/.WatcherService in 5000ms
04-30 09:48:11.570 W/WindowManagerService.mKeyguardTokenWatcher( 1401): cleaning up leaked reference: watcher
| Assignee | ||
Comment 1•12 years ago
|
||
Notice the answer here about "empty processes":
http://stackoverflow.com/questions/14043442/no-longer-want-activitymanager-not-a-service-issue
| Assignee | ||
Comment 2•12 years ago
|
||
Oops, maybe not. I was confusing "empty" and "hidden". Watcher already calls startForeground.
| Assignee | ||
Comment 3•12 years ago
|
||
Mobile platform team discussed this bug as a "deep dive" item today. Main ideas:
- add low memory logging to watcher to help determine root cause
- use an alternate wake lock (don't rely on watcher):
- eg. http://stackoverflow.com/questions/5780280/how-can-i-see-which-wakelocks-are-active
- or use an extension in fennec
| Assignee | ||
Comment 4•12 years ago
|
||
Some relevant and possibly useful info:
http://code.google.com/p/android/issues/detail?id=14340
http://stackoverflow.com/questions/6645193/foreground-service-being-killed-by-android
Assignee: nobody → gbrown
| Assignee | ||
Comment 5•12 years ago
|
||
| Assignee | ||
Comment 6•12 years ago
|
||
| Assignee | ||
Comment 7•12 years ago
|
||
All of this needs testing and debugging, especially using the techniques described at http://stackoverflow.com/questions/6645193/foreground-service-being-killed-by-android. I should get back to this in a few days.
| Assignee | ||
Comment 9•12 years ago
|
||
So far I have only really tested on a Galaxy S/Android 2.2.1, and this is preliminary, but I am seeing some troubling issues:
- Watcher is restarted approximately every 30 minutes, even when nothing else is running (no tests, no browser, nothing)
- dumpsys activity processes shows:
*APP* UID 10085 ProcessRecord{485b7a00 4646:com.mozilla.watcher/10085}
dir=/data/app/com.mozilla.watcher-2.apk publicDir=/data/app/com.mozilla.watcher-2.apk data=/data/data/com.mozilla.watcher
packageList=[com.mozilla.watcher]
thread=android.app.ApplicationThreadProxy@4842fc60 curReceiver=null
pid=4646 starting=false lastPss=0
lastActivityTime=7370804 lruWeight=7250804
hidden=false empty=true
oom: max=15 hidden=10 curRaw=2 setRaw=2 cur=2 set=2
curSchedGroup=1 setSchedGroup=1
setIsForeground=false foregroundServices=false forcingToForeground=null
persistent=false removed=false persistentActivities=0
adjSeq=3873 lruSeq=439
services=[ServiceRecord{484b1ef8 com.mozilla.watcher/.WatcherService}]
It looks like the Watcher is not running as a foreground service.
SUTagent is in better shape:
*APP* UID 10082 ProcessRecord{484fbc00 4082:com.mozilla.SUTAgentAndroid/10082}
dir=/data/app/com.mozilla.SUTAgentAndroid-2.apk publicDir=/data/app/com.mozilla.SUTAgentAndroid-2.apk data=/data/data/com.mozilla.SUTAgentAndroid
packageList=[com.mozilla.SUTAgentAndroid]
thread=android.app.ApplicationThreadProxy@4832e238 curReceiver=null
pid=4082 starting=false lastPss=4398080
lastActivityTime=1898773 lruWeight=1898773
hidden=false empty=false
oom: max=15 hidden=14 curRaw=1 setRaw=1 cur=1 set=1
curSchedGroup=0 setSchedGroup=0
setIsForeground=false foregroundServices=true forcingToForeground=null
persistent=false removed=false persistentActivities=0
adjSeq=3873 lruSeq=317
activities=[HistoryRecord{485007b0 com.mozilla.SUTAgentAndroid/.SUTAgentAndroid}]
services=[ServiceRecord{48516b40 com.mozilla.SUTAgentAndroid/.service.ASMozStub}]
receivers=[ReceiverList{48613f58 4082 com.mozilla.SUTAgentAndroid/10082 remote:48572dd8}]
| Assignee | ||
Comment 10•12 years ago
|
||
Also note:
App #12: adj=svc /B 485b7a00 4646:com.mozilla.watcher/10085 (started-services)
vs.
App # 1: adj=vis /F 484fbc00 4082:com.mozilla.SUTAgentAndroid/10082 (foreground-service)
| Assignee | ||
Comment 11•12 years ago
|
||
SUTagent and Watcher have virtually identical code for calling setForeground/startForeground, and the code is functionally identical to the sample code at http://developer.android.com/reference/android/app/Service.html.
The problem is that there are calls to getClass() (return the class of the calling object), and Watcher calls getClass from a thread, rather than from the Service class. As a result, the getMethod calls throw NoSuchMethodException (on all platforms, so both tegras and pandas are affected) and Watcher then silently fails to request foreground status. Subsequently, Watcher has a relatively high chance of being killed by Android, especially on ICS+, which cleans up background services more aggressively.
| Assignee | ||
Comment 12•12 years ago
|
||
Update/improve the logging in Watcher, especially in code that requests running as a foreground service. Also changes all Watcher logging so that it consistently uses "Watcher" as the log tag. Tidied spacing/bracketing.
Attachment #750727 -
Attachment is obsolete: true
Attachment #754583 -
Flags: review?(jmaher)
| Assignee | ||
Comment 13•12 years ago
|
||
Similar logging changes for the corresponding code in sutAgent.
Attachment #754584 -
Flags: review?(jmaher)
| Assignee | ||
Comment 14•12 years ago
|
||
Adds onLowMemory and onTrimMemory handlers to Watcher. Android may call these when the system is running low on memory. We don't currently have any evidence that the Watcher is being closed due to low memory conditions, but the question arises whenever we investigate a bug such as this, where the Watcher unexpectedly dies.
Attachment #750729 -
Attachment is obsolete: true
Attachment #754586 -
Flags: review?(jmaher)
| Assignee | ||
Comment 15•12 years ago
|
||
sutAgent already implements onLowMemory but not onTrimMemory.
Attachment #750731 -
Attachment is obsolete: true
| Assignee | ||
Updated•12 years ago
|
Attachment #754587 -
Flags: review?(jmaher)
| Assignee | ||
Comment 16•12 years ago
|
||
Finally, here's the actual fix for the problem.
In the existing code, Watcher tries to use getClass() to get the Service class, which contains the startForeground, stopForeground, and setForeground methods. But it invokes getClass() from within a worker thread, so rather than getting the Service class, it gets a Thread class; subsequently, the getMethod calls fail, and the Watcher service is never made a foreground service.
This patch simply uses Class.forName to get the watcher service class.
Attachment #754589 -
Flags: review?(jmaher)
Comment on attachment 754583 [details] [diff] [review]
(1) improve logging in watcher
Review of attachment 754583 [details] [diff] [review]:
-----------------------------------------------------------------
A lot of little papercuts here, great to see it cleaned up. I am concerned about the increased logging from the Watcher. I recall debugging issues in sut/watcher last year and finding that too many log messages from the Watcher were problematic. With that said, the messages here are not going to be showing up every second, so they might be just fine.
Keep in mind that we are running watcher 1.14 on the tegras and 1.15 on the pandas. We don't have a solid method for updating the watcher in the field.
Attachment #754583 -
Flags: review?(jmaher) → review+
Updated•12 years ago
|
Attachment #754584 -
Flags: review?(jmaher) → review+
Comment on attachment 754586 [details] [diff] [review]
(3) add onLowMemory and onTrimMemory handlers to watcher
Review of attachment 754586 [details] [diff] [review]:
-----------------------------------------------------------------
I will assume we have direct access to System.gc() and that this hasn't already happened with other apps receiving the low memory notification.
Attachment #754586 -
Flags: review?(jmaher) → review+
Updated•12 years ago
|
Attachment #754587 -
Flags: review?(jmaher) → review+
Comment on attachment 754589 [details] [diff] [review]
(5) fix the actual problem: get the correct Service class
Review of attachment 754589 [details] [diff] [review]:
-----------------------------------------------------------------
I would like to see better error handling in here rather than logging an error condition and moving forward (and consequently hitting more and more errors)
::: build/mobile/sutagent/android/watcher/WatcherService.java
@@ +339,4 @@
> mNM = (NotificationManager)getSystemService(NOTIFICATION_SERVICE);
> try {
> + mStartForeground = serviceClass.getMethod("startForeground", mStartForegroundSignature);
> + mStopForeground = serviceClass.getMethod("stopForeground", mStopForegroundSignature);
so what happens if our previous try fails, then serviceClass is not defined at all?
Attachment #754589 -
Flags: review?(jmaher) → review-
| Assignee | ||
Comment 20•12 years ago
|
||
I just added a "return" to the exception condition. Now if Class.forName throws, we log the error, then return without setting mSetForeground, etc. Watcher execution continues normally, but does not run as a foreground service (just as it does today!)
I prefer to not treat this as a fatal error since not running the Watcher could lead to more serious consequences (sutAgent not restarted) with about the same chance that the condition goes undetected.
Attachment #754589 -
Attachment is obsolete: true
Attachment #754807 -
Flags: review?(jmaher)
Comment on attachment 754807 [details] [diff] [review]
(5) fix the actual problem: get the correct Service class
Review of attachment 754807 [details] [diff] [review]:
-----------------------------------------------------------------
thanks!
Attachment #754807 -
Flags: review?(jmaher) → review+
| Assignee | ||
Comment 22•12 years ago
|
||
(1) https://hg.mozilla.org/integration/mozilla-inbound/rev/3cf83f18d08b
(2) https://hg.mozilla.org/integration/mozilla-inbound/rev/558421b80984
(3) https://hg.mozilla.org/integration/mozilla-inbound/rev/2f1306b50d44
(4) https://hg.mozilla.org/integration/mozilla-inbound/rev/87ce589005bf
(5) https://hg.mozilla.org/integration/mozilla-inbound/rev/f6e0b536c2df
| Assignee | ||
Updated•12 years ago
|
Summary: Watcher dies sometimes on Pandas → Watcher dies sometimes, because it is not running as a foreground service
Comment 23•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/3cf83f18d08b
https://hg.mozilla.org/mozilla-central/rev/558421b80984
https://hg.mozilla.org/mozilla-central/rev/2f1306b50d44
https://hg.mozilla.org/mozilla-central/rev/87ce589005bf
https://hg.mozilla.org/mozilla-central/rev/f6e0b536c2df
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
You need to log in
before you can comment on or make changes to this bug.
Description
•