Closed Bug 867360 Opened 12 years ago Closed 12 years ago

Watcher dies sometimes, because it is not running as a foreground service

Categories

(Testing :: General, defect)

x86
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla24

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(5 files, 4 obsolete files)

In some robocop failure logs from Pandaboards, we see the Watcher die for no apparent reason. This may be unrelated to robocop -- robocop tests will be particularly sensitive to the keyguard, even if it is only displayed for a short time (it looks like the watcher is / would be restarted in 5 seconds). One example (see logs in blocked bugs for others): https://tbpl.mozilla.org/php/getParsedLog.php?id=22418706&tree=Mozilla-Inbound 04-30 09:48:11.492 I/ActivityManager( 1401): No longer want com.mozilla.watcher (pid 1884): hidden #16 04-30 09:48:11.492 W/ActivityManager( 1401): Scheduling restart of crashed service com.mozilla.watcher/.WatcherService in 5000ms 04-30 09:48:11.570 W/WindowManagerService.mKeyguardTokenWatcher( 1401): cleaning up leaked reference: watcher
Oops, maybe not. I was confusing "empty" and "hidden". Watcher already calls startForeground.
Mobile platform team discussed this bug as a "deep dive" item today. Main ideas: - add low memory logging to watcher to help determine root cause - use an alternate wake lock (don't rely on watcher): - eg. http://stackoverflow.com/questions/5780280/how-can-i-see-which-wakelocks-are-active - or use an extension in fennec
All of this needs testing and debugging, especially using the techniques described at http://stackoverflow.com/questions/6645193/foreground-service-being-killed-by-android. I should get back to this in a few days.
So far I have only really tested on a Galaxy S/Android 2.2.1, and this is preliminary, but I am seeing some troubling issues: - Watcher is restarted approximately every 30 minutes, even when nothing else is running (no tests, no browser, nothing) - dumpsys activity processes shows: *APP* UID 10085 ProcessRecord{485b7a00 4646:com.mozilla.watcher/10085} dir=/data/app/com.mozilla.watcher-2.apk publicDir=/data/app/com.mozilla.watcher-2.apk data=/data/data/com.mozilla.watcher packageList=[com.mozilla.watcher] thread=android.app.ApplicationThreadProxy@4842fc60 curReceiver=null pid=4646 starting=false lastPss=0 lastActivityTime=7370804 lruWeight=7250804 hidden=false empty=true oom: max=15 hidden=10 curRaw=2 setRaw=2 cur=2 set=2 curSchedGroup=1 setSchedGroup=1 setIsForeground=false foregroundServices=false forcingToForeground=null persistent=false removed=false persistentActivities=0 adjSeq=3873 lruSeq=439 services=[ServiceRecord{484b1ef8 com.mozilla.watcher/.WatcherService}] It looks like the Watcher is not running as a foreground service. SUTagent is in better shape: *APP* UID 10082 ProcessRecord{484fbc00 4082:com.mozilla.SUTAgentAndroid/10082} dir=/data/app/com.mozilla.SUTAgentAndroid-2.apk publicDir=/data/app/com.mozilla.SUTAgentAndroid-2.apk data=/data/data/com.mozilla.SUTAgentAndroid packageList=[com.mozilla.SUTAgentAndroid] thread=android.app.ApplicationThreadProxy@4832e238 curReceiver=null pid=4082 starting=false lastPss=4398080 lastActivityTime=1898773 lruWeight=1898773 hidden=false empty=false oom: max=15 hidden=14 curRaw=1 setRaw=1 cur=1 set=1 curSchedGroup=0 setSchedGroup=0 setIsForeground=false foregroundServices=true forcingToForeground=null persistent=false removed=false persistentActivities=0 adjSeq=3873 lruSeq=317 activities=[HistoryRecord{485007b0 com.mozilla.SUTAgentAndroid/.SUTAgentAndroid}] services=[ServiceRecord{48516b40 com.mozilla.SUTAgentAndroid/.service.ASMozStub}] receivers=[ReceiverList{48613f58 4082 com.mozilla.SUTAgentAndroid/10082 remote:48572dd8}]
Also note: App #12: adj=svc /B 485b7a00 4646:com.mozilla.watcher/10085 (started-services) vs. App # 1: adj=vis /F 484fbc00 4082:com.mozilla.SUTAgentAndroid/10082 (foreground-service)
SUTagent and Watcher have virtually identical code for calling setForeground/startForeground, and the code is functionally identical to the sample code at http://developer.android.com/reference/android/app/Service.html. The problem is that there are calls to getClass() (return the class of the calling object), and Watcher calls getClass from a thread, rather than from the Service class. As a result, the getMethod calls throw NoSuchMethodException (on all platforms, so both tegras and pandas are affected) and Watcher then silently fails to request foreground status. Subsequently, Watcher has a relatively high chance of being killed by Android, especially on ICS+, which cleans up background services more aggressively.
Update/improve the logging in Watcher, especially in code that requests running as a foreground service. Also changes all Watcher logging so that it consistently uses "Watcher" as the log tag. Tidied spacing/bracketing.
Attachment #750727 - Attachment is obsolete: true
Attachment #754583 - Flags: review?(jmaher)
Similar logging changes for the corresponding code in sutAgent.
Attachment #754584 - Flags: review?(jmaher)
Adds onLowMemory and onTrimMemory handlers to Watcher. Android may call these when the system is running low on memory. We don't currently have any evidence that the Watcher is being closed due to low memory conditions, but the question arises whenever we investigate a bug such as this, where the Watcher unexpectedly dies.
Attachment #750729 - Attachment is obsolete: true
Attachment #754586 - Flags: review?(jmaher)
sutAgent already implements onLowMemory but not onTrimMemory.
Attachment #750731 - Attachment is obsolete: true
Attachment #754587 - Flags: review?(jmaher)
Finally, here's the actual fix for the problem. In the existing code, Watcher tries to use getClass() to get the Service class, which contains the startForeground, stopForeground, and setForeground methods. But it invokes getClass() from within a worker thread, so rather than getting the Service class, it gets a Thread class; subsequently, the getMethod calls fail, and the Watcher service is never made a foreground service. This patch simply uses Class.forName to get the watcher service class.
Attachment #754589 - Flags: review?(jmaher)
Comment on attachment 754583 [details] [diff] [review] (1) improve logging in watcher Review of attachment 754583 [details] [diff] [review]: ----------------------------------------------------------------- A lot of little papercuts here, great to see it cleaned up. I am concerned about the increased logging from the Watcher. I recall debugging issues in sut/watcher last year and finding that too many log messages from the Watcher were problematic. With that said, the messages here are not going to be showing up every second, so they might be just fine. Keep in mind that we are running watcher 1.14 on the tegras and 1.15 on the pandas. We don't have a solid method for updating the watcher in the field.
Attachment #754583 - Flags: review?(jmaher) → review+
Attachment #754584 - Flags: review?(jmaher) → review+
Comment on attachment 754586 [details] [diff] [review] (3) add onLowMemory and onTrimMemory handlers to watcher Review of attachment 754586 [details] [diff] [review]: ----------------------------------------------------------------- I will assume we have direct access to System.gc() and that this hasn't already happened with other apps receiving the low memory notification.
Attachment #754586 - Flags: review?(jmaher) → review+
Attachment #754587 - Flags: review?(jmaher) → review+
Comment on attachment 754589 [details] [diff] [review] (5) fix the actual problem: get the correct Service class Review of attachment 754589 [details] [diff] [review]: ----------------------------------------------------------------- I would like to see better error handling in here rather than logging an error condition and moving forward (and consequently hitting more and more errors) ::: build/mobile/sutagent/android/watcher/WatcherService.java @@ +339,4 @@ > mNM = (NotificationManager)getSystemService(NOTIFICATION_SERVICE); > try { > + mStartForeground = serviceClass.getMethod("startForeground", mStartForegroundSignature); > + mStopForeground = serviceClass.getMethod("stopForeground", mStopForegroundSignature); so what happens if our previous try fails, then serviceClass is not defined at all?
Attachment #754589 - Flags: review?(jmaher) → review-
I just added a "return" to the exception condition. Now if Class.forName throws, we log the error, then return without setting mSetForeground, etc. Watcher execution continues normally, but does not run as a foreground service (just as it does today!) I prefer to not treat this as a fatal error since not running the Watcher could lead to more serious consequences (sutAgent not restarted) with about the same chance that the condition goes undetected.
Attachment #754589 - Attachment is obsolete: true
Attachment #754807 - Flags: review?(jmaher)
Comment on attachment 754807 [details] [diff] [review] (5) fix the actual problem: get the correct Service class Review of attachment 754807 [details] [diff] [review]: ----------------------------------------------------------------- thanks!
Attachment #754807 - Flags: review?(jmaher) → review+
Summary: Watcher dies sometimes on Pandas → Watcher dies sometimes, because it is not running as a foreground service
Blocks: 879089
Depends on: 876715
Blocks: 878587
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: