Closed Bug 1250909 Opened 8 years ago Closed 8 years ago

Autophone - add ability for device to reboot when it detects a missing usb connection.

Categories

(Testing Graveyard :: Autophone, defect)

defect
Not set
normal

Tracking

(firefox47 affected)

RESOLVED FIXED
Tracking Status
firefox47 --- affected

People

(Reporter: bc, Assigned: bc)

References

Details

Attachments

(7 files, 2 obsolete files)

When a device loses usb connectivity, our current solution is to gracefully shutdown Autophone and reboot the host. This has the drawback of reducing our testing capacity while the shutdown/reboot is in progress and unfortunately does not always work. We have had too frequent occurrences where devices, in particular the Nexus 6P devices, do not reconnect after the host reboot. The solution has been to ask someone to physically unplug and plug in the usb cable on the device.

I've experimented with shell based scripts which use the output from dumpsys battery or dumpsys usb to detect if the device has lost usb connectivity and to reboot if so. This works, but has issues due to the lack of nohup and running scripts as background processes. I still have a couple of ideas to try but so far the approaches I've tried aren't practical.

It is possible to detect usb attach and detach events in Android apps. See
http://developer.android.com/guide/topics/connectivity/usb/index.html. The host and accessory functionality is limited to Android 3.1/api 12+ though some parts of the accessory functionality may be available as an add-on library for Android 2.3.4/api 10. Considering api 10 support is approaching EOL, we might just ignore it for now.

If I can't get the shell script approach to work, it will be necessary to use an app. The idea would be that the app would be installed on the device by default. When we start Autophone, Autophone could start the app and have it begin monitoring usb either via receiving events or polling dumpsys battery or dumpsys usb. If the device reboots itself, Autophone will need to restart the app in order to monitor usb as well as do other initialization task such as resetting SELinux to Permissive. There is some potential for conflicts/races between the current code which reboots the host in the event of a device usb disconnection and we will need to gracefully terminate whatever test might have been running when the disconnection occurred.

The reason I've mentioned manually starting the app rather than having it automatically run as a service, is to make it impossible for us to get into an infinite reboot loop when the device is not connected via usb. With careful coding, we could eliminate that possibility but having Autophone initiate the usb monitoring seems reasonable to me.

Thoughts?
I have a preliminary version of a service which monitors the usb connection via polling dumpsys battery. I tried implementing a BroadcastReceiver that would notify the service when usb was detached but couldn't get that to work. I'm not entirely unhappy with the polling solution since I think I can fit it into Autophone's ping logic pretty well.

Once the service is installed, it will be started by the worker via 

am startservice com.mozilla.bclary.usbwatchdog

The service does not start properly on Android 2.3 but does with my gs3/Android 4.0.4 and 6p/6.0.1. I think this will be ok, especially considering Android 2.3 will be dropping from trunk builds in the next couple of weeks.

Currently, the service uses a hard coded sleep interval between polls. I don't see how to pass an argument to the service via am startservice since I didn't create any activities for it. Instead, I will have the service attempt to load a configuration file from /data/local/tmp and if it exists, use it to determine the polling interval. If the file is not found, a default value will be used. Autophone can control this setting via a command line option and by pushing the config file to the device before the service is started.

When the service detects that the device is no longer USB Powered, it will use su -c reboot to reboot the device.

I want the polling interval to be short enough that the most common case will be that the device detects that is has disconnected and reboots before Autophone notices but long enough that the repeated polls of dumpsys battery do not cause significant load.

When checking the device's state in PhoneWorkerSubprocess.ping, if get_state() does not return 'device', then the worker will sleep for a time longer then the usbwatchdog polling interval to give the device time to reboot itself.

If the device's state is 'device', the device's uptime will be retrieved and checked against a saved value to see if the device has rebooted itself since the last time it was checked. If the uptime is less than the saved uptime value, then additional work must be done to initialize SELinux etc and to make sure any currently running test is failed with a retry.

There is some duplication between PhoneWorkerSubprocess.ping and ADBAndroid.is_device_ready, but I think I'll leave that to a follow up bug.
Latest wip version supports sdk 10+, automatically detects whether su 0 or su -c should be used and whether the arguments should be quoted, allows the poll interval to be passed as an int extra to the startservice command and works on Android 2.3+.

It *requires* a non-standard su which provides su access for users other than root and shell for the reboot which I do not have on all devices.

Currently, this works for nexus-4-{1,2,3,4} and nexus-6p-{1..8}. While this is not total coverage it is an improvement and will hopefuly help alleviate the semi-frequent nexus-6p disconnection problem.

I'm not sure, but it might be possible to use the standard android su if the app were installed as a system app and the appropriate permission was requested.

Is all that is needed is to be able to mount /system as rw and to place the apk in /system/app/ ?
Attached file AndroidManifest.xml (obsolete) —
Attached file USBService.java (obsolete) —
Assignee: nobody → bob
Status: NEW → ASSIGNED
Attachment #8727375 - Flags: review?(gbrown)
Attachment #8727376 - Flags: review?(gbrown)
Attachment #8727374 - Flags: review?(gbrown)
fyi, adbd is running as root and I can remount /system rw for nexus-4-{5,6,7}; nexus-6-{1,2}; nexus-9-{1,2}. I'll experiment here locally to see if I can actually install USBWatchdog as a system app. I don't think this will allow me to run su even if it exists, but perhaps I can add the REBOOT permission and use PowerManager.reboot <http://developer.android.com/reference/android/os/PowerManager.html> when the app is installed as a system app. The current approach is sufficient for the moment, but if any one has suggestsions please feel free to share them.
Attachment #8727374 - Flags: review?(gbrown)
Comment on attachment 8727375 [details]
USBService.java

I'm pretty sure I can do the system app approach for devices which support it. New version coming up.
Attachment #8727375 - Flags: review?(gbrown)
Attached file AndroidManifest.xml
Attachment #8727822 - Flags: review?(gbrown)
Attached file USBService.java
Attachment #8727374 - Attachment is obsolete: true
Attachment #8727375 - Attachment is obsolete: true
Attachment #8727824 - Flags: review?(gbrown)
Attachment #8727822 - Attachment mime type: text/xml → text/plain
Tested this with my gs3, nexus one and nexus 6p locally. the gs3 and nexus one devices allowed me to make /system rw and I installed the app as a system app. the 6p does not permit remounting /system as rw so it was installed as a normal user app.

Also tested with nexus-4-7 and nexus-s-10 in production installed as system apps. the nexus 4 allowed adb remount to work but the nexus s required the following from a shell:
mount -o rw,remount -t ext4 /dev/block/platform/s3c-sdhci.0/by-name/system /system

They showed

adb -s $(mobile-inventory.sh -p nexus-4-7 -s) logcat -d -v time | grep USBWatch
03-08 13:49:17.943 I/USBWatchdog( 1845): Permissions granted
03-08 13:49:17.943 I/USBWatchdog( 1845): Checking battery

adb -s $(mobile-inventory.sh -p nexus-s-10 -s) wait-for-device shell logcat -d -v time | grep USBWatchdog
03-08 13:58:29.175 I/USBWatchdog(  604): Permissions granted
03-08 13:58:29.175 I/USBWatchdog(  604): Checking battery

I presume they'll reboot when required.
Comment on attachment 8727822 [details]
AndroidManifest.xml

> package="com.mozilla.bclary.usbwatchdog"

Maybe "com.mozilla.autophone.usbwatchdog"?
Attachment #8727822 - Flags: review?(gbrown) → review+
Comment on attachment 8727376 [details] [diff] [review]
bug-1250909-v1.patch

Review of attachment 8727376 [details] [diff] [review]:
-----------------------------------------------------------------

I am a little uncomfortable with the approach. This feels like a return to sutagent and watcher. But I don't have a better idea.

::: autophone.py
@@ -787,4 @@
>                      device_ready_retry_attempts=self.options.device_ready_retry_attempts,
>                      verbose=self.options.verbose,
>                      test_root=test_root)
> -                dm.power_on()

This is moved to the worker now, right?

::: worker.py
@@ +363,5 @@
>          return success
>  
> +    def start_usbwatchdog(self):
> +        if not self.dm.is_app_installed(self.options.usbwatchdog_appname):
> +            return

Would it be useful to log "usbwatchdog not installed"?
Attachment #8727376 - Flags: review?(gbrown) → review+
(In reply to Geoff Brown [:gbrown] from comment #11)
> Comment on attachment 8727822 [details]
> AndroidManifest.xml
> 
> > package="com.mozilla.bclary.usbwatchdog"
> 
> Maybe "com.mozilla.autophone.usbwatchdog"?

I can do that.

(In reply to Geoff Brown [:gbrown] from comment #12)
> Comment on attachment 8727376 [details] [diff] [review]
> bug-1250909-v1.patch
> 
> Review of attachment 8727376 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I am a little uncomfortable with the approach. This feels like a return to
> sutagent and watcher. But I don't have a better idea.
> 

This is the only way I could think of even attempting to recover from a disconnection. We can't talk to it over usb. adb over tcp/ip seems like such a can of worms.

> ::: autophone.py
> @@ -787,4 @@
> >                      device_ready_retry_attempts=self.options.device_ready_retry_attempts,
> >                      verbose=self.options.verbose,
> >                      test_root=test_root)
> > -                dm.power_on()
> 
> This is moved to the worker now, right?
> 

Yeah

> ::: worker.py
> @@ +363,5 @@
> >          return success
> >  
> > +    def start_usbwatchdog(self):
> > +        if not self.dm.is_app_installed(self.options.usbwatchdog_appname):
> > +            return
> 
> Would it be useful to log "usbwatchdog not installed"?

I don't think so. It is easy to check via pm list packages and we need to do the installation manually outside of Autophone anyway.

Let me change the app name and set up a new android studio project and put it on github and we'll be ready to go.
Comment on attachment 8727824 [details]
USBService.java

Hmm. No splinter review for text?

> mPermissions &= mPermissions && (permission == PackageManager.PERMISSION_GRANTED);

Too many &'s here? I think:

mPermissions = mPermissions && (permission == PackageManager.PERMISSION_GRANTED);

would be sufficient.

I ran this on my AOSP 4.0.1 Galaxy Nexus and got "Do not have su or sufficient permissions." I'm not sure why:

root@android:/ # su -c 'ls /data/data'                                       
com.android.apps.tag
...
root@android:/ # su -c 'id'                                                    
uid=0(root) gid=0 groups=0

root@android:/ # run-as com.mozilla.bclary.usbwatchdog
root@android:/data/data/com.mozilla.bclary.usbwatchdog $ id
uid=10038 gid=10038
root@android:/data/data/com.mozilla.bclary.usbwatchdog $ su -c 'id'
uid=0(root) gid=0 groups=0


Would you check this into autophone's github? What's the plan for building?
Attachment #8727824 - Flags: review?(gbrown) → feedback+
(In reply to Geoff Brown [:gbrown] from comment #14)
> Comment on attachment 8727824 [details]
> USBService.java
> 
> Hmm. No splinter review for text?
> 

I guess I'll have to learn that.

> > mPermissions &= mPermissions && (permission == PackageManager.PERMISSION_GRANTED);
> 
> Too many &'s here? I think:

Thanks.

> 
> mPermissions = mPermissions && (permission ==
> PackageManager.PERMISSION_GRANTED);
> 
> would be sufficient.
> 
> I ran this on my AOSP 4.0.1 Galaxy Nexus and got "Do not have su or
> sufficient permissions." I'm not sure why:
> 

That is a standard android build with standard su right? That does not support su for any users other than root or shell. That is why I needed to add the support for installation as a system app and the use of permissions and the PowerManager.

If you still have it installed as a user app, do 

adb uninstall com.mozilla.bclary.usbwatchdog

then
adb root
adb remount
adb push app-debug.apk /data/local/tmp/
adb shell
dd if=/data/local/tmp/app-debug.apk of=/system/app/app-debug.apk
chmod 644 /system/app/app-debug.apk
reboot

then

adb shell 'am startservice -n com.mozilla.bclary.usbwatchdog/.USBService  --ei poll_interval 60'

to check if it is running ok:

adb logcat -d | grep USBWatchdog

Then unplug the cable and wait 60 seconds. It should reboot.

> Would you check this into autophone's github? What's the plan for building?

I'm putting it in https://github.com/bclary/USBWatchdog

It will build as an Android Studio project.
I re-installed with your procedure and re-tested -- it works fine now.

This is an aosp full-eng build. I have had it for a while -- I'm not positive of the origin/history of su.

I am surprised that 'run-as <package> su -c ...' was not an effective simulation. Interesting.
Attachment #8727824 - Flags: review+
I'll add some instructions to the README.md and add a link from Autophone to the repo along with some more instructions/background.

Thanks! I'm running a local test with the new apk. I'll check the apk into the autophone repo so people don't have to build it. It is debug so it includes the debug signing. I don't think it is necessary to do the whole full blown signature thing for now.
I had run the original app quite extensively over the weekend without triggering a disconnected device. While I hoped this had solved the problem of recovering a disconnected device I didn't have proof.

I ran another long running test last night using the latest version with your changes and ran into problems unrelated to the changes. I am concerned this solution as currently implemented will not solve our problem.

One of my nexus-one devices disconnected and it did not recover. I was running autophone without the reboot on error option so did not get the benefit of the host reboot. It wasn't clear to me that the reboot had occurred.

I restarted and ran into the same problem some hours later with the same nexus one device. I carefully checked the uptime quickly after unplugging and plugging in the device before the running autophone instance could reboot it and it appeared it had rebooted but it wasn't absolutely clear.

I changed autophone's configuration to reboot on error and started the test again. It ran well overnight up until a few minutes ago when my workstation rebooted. A different nexus one device had disconnected. I looked at the device itself and found it was still usb charging though the host did not detect it via adb devices. The use of the signal "USB powered: false" may not be sufficient. I may need to detect if adb debugging is active instead. I should also log the reboot since it is not possible to see anything from logcat from the time the disconnection occurs and when the reboot occurs.
Attached file 6p.txt.gz
I've added checking for the adbd process. During a test, the nexus 6p device disconnected but the USBWatchdog did not detect it. I shut down the autophone instance, and did adb kill-server then adb devices and it did not recover. I rebooted the workstation and it did recover. Attaching the logcat from the device in case someone sees something. The error occurred shortly after a reboot and the first invocation of fennec around 04:34.
Carrying forward gbrown's r+.

I've pushed the USBWatchdog Android Studio project to https://github.com/bclary/USBWatchdog

This version does not attempt to use dumpsys battery to check for USB Power or the existence of the adbd process to detect when the connection is lost. I found that neither of these would detect device disconnections.

Instead it uses a "heartbeat" property which is periodically set by the worker process. When the USBWatchdog service detects that the heartbeat has not been updated within the specified polling interval it will reboot the device. This approach will accurately detect when the host can not longer communicate with the device over USB.

I also made minor changes to include the ability to pass a debug argument when starting the USBWatchdog service.

I am not certain that this will alleviate all USB disconnections which require physical intervention. I found some cases during local testing where a simple reboot of the device without rebooting the host did not reconnect the device. It is my hope this will at least reduce the number of required physical interventions.
Attachment #8731249 - Flags: review+
https://github.com/mozilla/autophone/commit/4a9b86aaf016587acedcfbe9a8bf77ff1a5d9db1

I'll close when the apk is installed and the commit deployed to the servers.
Attached file usbwatchdog-status.txt
Some of the devices report insufficient permissions. Need to look into that when time permits.
deployed 2016-03-15 ~22:46
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
doh, deployed 2016-03-16 ~22:46
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: