Closed Bug 1078645 Opened 10 years ago Closed 6 years ago

Autophone - support/document use of emulators instead of physical devices

Categories

(Testing Graveyard :: Autophone, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bc, Unassigned)

Details

Attachments

(2 files, 8 obsolete files)

      No description provided.
Assignee: nobody → gbrown
To start looking into this, I created a new avd from Android's "Manage AVDs..." based on the "Nexus 5" device. I kept the defaults - API level 19, arm, etc - added an SD Card and made sure to select "Use Host GPU". 

I started the emulator simply with "emulator -avd <myavdname>" and cleared the lock screen.

adb devices showed:

List of devices attached 
emulator-5554	device

so I set up my devices.ini with:

serialno=emulator-5554

I started autophone and it connected to the emulator.
But there's a problem with rebooting because "adb reboot" hangs on the emulator.

2014-10-10 08:00:13,898|INFO|gbrown-emu|6804|Rebooting phone...
2014-10-10 08:00:13,898|DEBUG|gbrown-emu|6769|PhoneWorker:process_msg
2014-10-10 08:05:13,997|ERROR|gbrown-emu|6804|Exception while rebooting!
Traceback (most recent call last):
  File "/home/gbrown/autophone/worker.py", line 304, in recover_phone
    if self.dm.reboot():
  File "/home/gbrown/autophone/adb_android.py", line 147, in reboot
    self.command_output(["reboot"], timeout=timeout)
  File "/home/gbrown/autophone/adb.py", line 753, in command_output
    timeout=timeout)
  File "/home/gbrown/autophone/adb.py", line 253, in command_output
    raise ADBTimeoutError("%s" % adb_process)
ADBTimeoutError: args: adb -s emulator-5554 wait-for-device reboot, exitcode: None, stdout: , stderr:
As we talked about on irc, I think what we will need to do is something like:

a custom emulator reboot that:

* calls reboot on the emulator to at least tell android it is going to shut down. We can use a relatively short timeout and catch the ADBTimeoutError.
* then adb -s emulator-xxxx emu kill to kill the emulator
* then call emulator -avd ... to restart the emulator.

It may be an issue if we have more than one emulator running and our emulator-xxxx serial changes though.

Even though mozdevice is the source of truth for adb*.py, you can develop your patch in git against the autophone version and get it working before we try to patch the mozdevice version.
I looked at some possible reboot alternatives. 

adb shell stop / adb shell start will stop and restart zygote (and many associated services) but that's not quite a reboot, and occasionally hangs.

telnet localhost 5554 + avd stop / avd start works reliably but again does not reboot or even restart anything -- it seems to just suspend the emulator.

I would prefer to avoid restarting the emulator(s) if possible. I wonder if any action is really required at all -- could we skip reboots for emulators?
I can't swear to the details, but I seem to recall that I needed to reboot after some uninstallations to completely remove the previous apps. We can preserve the port via emulator -port <port>.

What is the downside to killing/starting the emulator?
I'm mostly worried about problems with not getting the same emulator port / serial name on restart. The -port option makes me more hopeful of a clean solution.
I worked around the reboot issue, but found that tests were still not running on the emulator. We were getting stuck in the CHARGING state, because the emulator reports the battery as 50% charged. I updated device_battery_min/max in autophone.ini and tests started working. I don't think any code changes are needed for this, but it is something to note in documentation.
2014-10-10 11:32:32,589|DEBUG|gbrown-emu|9163|shell_output: adb -s emulator-5554 wait-for-device shell dumpsys battery; echo rc=$?, timeout: None, root: False, timedout: None, exitcode: 0, output: Current Battery Service state:
  AC powered: true
  USB powered: false
  Wireless powered: false
  status: 2
  health: 2
  present: true
  level: 50
  scale: 100
  voltage: 0
  temperature: 0
  technology: Li-ion
With a reboot work-around and device_battery_min/max set appropriately, autophone installs and launches Fennec on the emulator, but something goes terribly wrong shortly thereafter. adb operations start timing out and the emulator becomes completely unresponsive -- it cannot even be closed. This *seems* to happen randomly, in that the first operation to timeout seems to vary from one autophone run to another.
:-(

2G RAM?
200M Internal storage?
How much storage did you put on the sdcard?

Does this happen with the private emulator builds we using in unit tests

Can you run with with loglevel = DEBUG and verbose = True and send me autophone.log and the autophone-<device>.log ?
The problems I reported on 2014-10-10 magically disappeared: I ran autophone against the same emulator several times today but could not reproduce the emulator/adb hang.

(I am using an Android 19 AVD with 2G RAM, 200M Internal, 200M sdcard. I have not tried the emulator builds we use for unit tests.)
Attachment #8505006 - Flags: review?(bclary) → review+
(In reply to Bob Clary [:bc:] from comment #3)
> As we talked about on irc, I think what we will need to do is something like:
> 
> a custom emulator reboot that:
> 
> * calls reboot on the emulator to at least tell android it is going to shut
> down. We can use a relatively short timeout and catch the ADBTimeoutError.
> * then adb -s emulator-xxxx emu kill to kill the emulator
> * then call emulator -avd ... to restart the emulator.
> 
> It may be an issue if we have more than one emulator running and our
> emulator-xxxx serial changes though.

To restart the emulator, we need to at least know the "-avd" argument; ideally, we want to use all the same arguments originally passed to "emulator". I tried parsing the process list to find "emulator" and note its command line, and cwd. But "emulator" forks "emulator64-arm" (or similar, depending on platform), leaving no trace of the original "emulator" call. Restarting the emulator with the emulator64-arm command line and cwd fails when libraries are not found. I suspect "emulator" sets LD_LIBRARY_PATH, or similar, when launching emulator64-arm; I don't think it is possible to inspect emulator64-arm's environment. The best way forward that I can find is to get the command line from the process list, but modify cmd[0] to launch "emulator" instead of the existing child process. This works for me, but the whole thing feels a little brittle!
This works for me!
Attachment #8505833 - Flags: review?(bclary)
Comment on attachment 8505833 [details] [diff] [review]
restart emulator instead of rebooting

Review of attachment 8505833 [details] [diff] [review]:
-----------------------------------------------------------------

r- for some architectural issues.

We should use psutils 2.1 or later and should update requirements.txt to
reflect that.

I think that the emulator specific code can be used by both android and b2g.
As such, I it should live in ADBDevice and not ADBAndroidMixin.

It should be initialized when the other aspects of the device are detected
in the constructor. We could store the port, the avd, etc and processid of
the emulator as attributes of the ADBDevice object.

If we knew the avd, we could obtain the port from the serial and start the
emulator ourselves if it wasn't already running. Perhaps an optional
argument to ADBDevice(..., avd_name=None). This won't help Autophone at
the moment but would be helpful for users of mozdevice.adb.

I think a simple ADBDevice._stop_emulator and ADBDevice._start_emulator
pair would work.

On initialization, if the emulator isn't running and we have avd_name,
we can simply call _start_emulator which should store the processid for
use by _stop_emulator. If the emulator isn't already running and we
don't know the avd_name, we should raise an Error.

When performing the reboot, if we are an emulator, we can simply
call _stop_emulator, then _start_emulator.

Later we can work on moving the basic reboot/restart code into
ADBDevice and use overridden versions of reboot in ADBAndroid which 
have the OS specific device ready detection. But that is for another
bug unless you want to handle it here.

What do you think? Let's chat on irc for a bit and get our ducks in
a row.

I think this is great and has great potential for use in both android
as well as b2g testing when we get it back into mozdevice.

::: adb_android.py
@@ +4,5 @@
>  
>  import os
>  import re
>  import time
> +import psutil

channeling mcote: PEP 8 specifies that external third-party imports should go
between standard library imports and local imports.

Let's fix all of these imports to be

import os
import re
import time
from distutils.version import StrictVersion
	
import psutil

from adb import ADBDevice, ADBError	
from adb import ADBDevice, ADBError

@@ +156,5 @@
> +        if not emuproc:
> +            for proc in psutil.process_iter():
> +                if len(proc.cmdline) > 0 and 'emulator' in proc.cmdline[0]:
> +                    emuproc = proc
> +                    break

This will pick up any emulator, but it may not be the one we want in general. I don't think we should do this.

@@ +169,5 @@
> +            # executable with 'emulator'. emulator spawns a child like
> +            # 'emulator64-arm', but starting the child directly typically
> +            # fails, perhaps because of a custom environment.
> +            cmdline = emuproc.cmdline
> +            cmdline[0] = 'emulator'

This is a psutils 1.2.x idiom. I think we should use psutils 2.1 or later and also put it in the requirements.txt.

@@ +174,5 @@
> +            cwd = emuproc.getcwd()
> +            # Terminate existing emulator
> +            self.command_output(["emu", "kill"], timeout=timeout)
> +            # Restart emulator
> +            psutil.Popen(cmdline, cwd=cwd)

We need do a polling loop to get the exitcode as well as to kill the process if it times out. If this fails, we will need to raise either and ADBError or an ADBTimeoutError

@@ +176,5 @@
> +            self.command_output(["emu", "kill"], timeout=timeout)
> +            # Restart emulator
> +            psutil.Popen(cmdline, cwd=cwd)
> +        else:
> +            self._logger.warn('unable to restart emulator: not running?')

I generally prefer the shorter block to come first. I forget whether I learned it from Kernighan and Ritcher or Kernighan and Plauger, so in general something like

if not emuproc:
  self._logger.warn('unable to restart emulator: not running?')
else:
  ....

Even so, if the emulator isn't running at all, we don't raise an error and instead wait for the commands to time out which doesn't seem optimal.
Attachment #8505833 - Flags: review?(bclary) → review-
Thanks Bob. I like most of your ideas and have tried to stay true to your vision in this patch. 

One change to note is that I allow for clients of ADBDevice to specify the whole emulator command line, rather than just the avd name -- I think that additional flexibility may come in handy some day. For instance, I sometimes pass -qemu arguments when starting an emulator.
Attachment #8505833 - Attachment is obsolete: true
Attachment #8508956 - Flags: review?(bclary)
Attachment #8508956 - Flags: review?(bclary)
Attached patch bug-1078645-v3.patch (obsolete) — Splinter Review
I added the following to the devices.ini file to define the emulator and the command needed to start it.

[nexus-10-emulator]
serialno=emulator-5556
command=emulator -avd AVD_for_Nexus_10_by_Google -port 5556 -gpu on

The sigchld handler in autophone.py is necessary to prevent zombie emulator processes from holding onto the emulated device's partitions but it causes a multiprocessing error on shutdown. I'll attach a stack.
Attached file shutdown error (obsolete) —
Comment on attachment 8511203 [details] [diff] [review]
bug-1078645-v3.patch

Review of attachment 8511203 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Bob -- good improvements here. 

Other thoughts...

::: adb.py
@@ +521,5 @@
> +                raise ADBError('ADBDevice.__init__: Emulator must have an '
> +                               'emulator command')
> +            if ('-avd' not in self._emulator_command or
> +                '-port' not in self._emulator_command or
> +                self.gpu_flag not in self._emulator_command):

This seems unnatural. 

I would not normally specify -gpu when starting an emulator -- I would specify it in the avd.

It's possible - but unusual - to start the emulator without an avd; you just need to specify all the necessary things on the command line.

For that matter, I wouldn't normally specify -port...but I think there's a fairly good reason for that requirement.

@@ +594,5 @@
> +            cmd = ' '.join(proc.cmdline())
> +            if (cmd and 'emulator' in proc.name() and ' -avd ' in cmd):
> +                if port_arg in cmd:
> +                    # Kill the emulator if it is not running with gpu
> +                    if self.gpu_flag not in cmd:

Similar concern here: I don't think we need to require -avd and -gpu. I prefer to say "start the emulator in such a way that you can run fennec -- and check that it can!" and keep the code flexible.

@@ +1763,5 @@
> +
> +        self._logger.debug('ADBDevice._start_emulator: pid: %s, name: %s' % (
> +            self._emulator_proc.pid, self._emulator_proc.name()))
> +
> +    def stop_emulator(self):

I am offended by the asymmetry: _start vs stop.

@@ +1778,5 @@
> +        if not self._is_emulator:
> +            raise ADBError("Is not an emulator.")
> +
> +        try:
> +            self.command_output(["reboot"], timeout=1.0)

I have mixed feelings about this: I'm not sure that it does anything useful.

@@ +1792,5 @@
> +        self._get_emulator_process()
> +
> +        # Kill the emulator and check/wait for emulator termination
> +        if self._emulator_proc:
> +            self._emulator_proc.terminate()

You didn't like "adb emu kill"? It seemed like the civilized way to bring down the emulator...but this is simpler.
(In reply to Geoff Brown [:gbrown] from comment #20)
> Comment on attachment 8511203 [details] [diff] [review]
> bug-1078645-v3.patch
> 
> Review of attachment 8511203 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks Bob -- good improvements here. 
> 
> Other thoughts...
> 
> ::: adb.py
> @@ +521,5 @@
> > +                raise ADBError('ADBDevice.__init__: Emulator must have an '
> > +                               'emulator command')
> > +            if ('-avd' not in self._emulator_command or
> > +                '-port' not in self._emulator_command or
> > +                self.gpu_flag not in self._emulator_command):
> 
> This seems unnatural. 
> 
> I would not normally specify -gpu when starting an emulator -- I would
> specify it in the avd.
> 
> It's possible - but unusual - to start the emulator without an avd; you just
> need to specify all the necessary things on the command line.
> 
> For that matter, I wouldn't normally specify -port...but I think there's a
> fairly good reason for that requirement.
> 
> @@ +594,5 @@
> > +            cmd = ' '.join(proc.cmdline())
> > +            if (cmd and 'emulator' in proc.name() and ' -avd ' in cmd):
> > +                if port_arg in cmd:
> > +                    # Kill the emulator if it is not running with gpu
> > +                    if self.gpu_flag not in cmd:
> 
> Similar concern here: I don't think we need to require -avd and -gpu. I
> prefer to say "start the emulator in such a way that you can run fennec --
> and check that it can!" and keep the code flexible.
> 

I want to make sure we don't *ever* accidentally identify a process as an emulator process just because it happens to have 'emulator' in the command line and the port number somewhere in the command line.

I don't think it is a burden to require the command line in devices.ini to have '-gpu on' as it will not affect an avd that already has gpu enabled and will make sure any other avd will be started with it regardless of the avd settings.

The use of -port NNNN also helps us make sure we identify the correct process when we are searching for an existing emulator. I made it an attribute though since I did relax it while testing without gpu turned on. I intend to use an emulator without gpu as my major test case for properly processing crashes which I realized is totally borked while testing this patch.

We can drop the -avd requirement though.

Of course if we spent some time making sure we match the command line arguments against the psutil command line list we could drive from there. It just seemed to be fraught with complications of which there seemed to be enough already.

> @@ +1763,5 @@
> > +
> > +        self._logger.debug('ADBDevice._start_emulator: pid: %s, name: %s' % (
> > +            self._emulator_proc.pid, self._emulator_proc.name()))
> > +
> > +    def stop_emulator(self):
> 
> I am offended by the asymmetry: _start vs stop.
> 

Heh. I did the public stop along with the public is_emulator so I could call it from the worker to stop it before terminating the worker thread during my attempts to fix that shutdown process not found error. 

If you want to revert those changes, I'm fine with that. I almost removed _start altogether since it really is a one line call to Popen....

> @@ +1778,5 @@
> > +        if not self._is_emulator:
> > +            raise ADBError("Is not an emulator.")
> > +
> > +        try:
> > +            self.command_output(["reboot"], timeout=1.0)
> 
> I have mixed feelings about this: I'm not sure that it does anything useful.
> 

The reboot part? Me too. I almost removed it. Let's go ahead and drop it.

> @@ +1792,5 @@
> > +        self._get_emulator_process()
> > +
> > +        # Kill the emulator and check/wait for emulator termination
> > +        if self._emulator_proc:
> > +            self._emulator_proc.terminate()
> 
> You didn't like "adb emu kill"? It seemed like the civilized way to bring
> down the emulator...but this is simpler.

I did that while I was chasing down zombies. It seemed that if we had the process already, we should use it.
My emulator/avd hang (comment 9, comment 11) re-appeared and I spent some time testing/debugging:
 - this happens independent of autophone; it happens while running robocop tests, over adb or sut
 - it happens with different emulator and avd versions, even those from production (2.3)
 - emulator output from "-debug all" does not show any errors
 - logcat frequently ends with either:
10-27 15:33:20.436 W/EGL_emulation( 1789): eglSurfaceAttrib not implemented
10-27 15:33:20.446 D/OpenGLRenderer( 1789): Enabling debug mode 0
10-27 15:33:20.545 I/Process (   78): Sending signal. PID: 1789 SIG: 3
10-27 15:33:20.545 I/dalvikvm( 1789): threadid=3: reacting to signal 3

or some other message about EGL or libEGL.
(In reply to Bob Clary [:bc:] from comment #21)
> (In reply to Geoff Brown [:gbrown] from comment #20)
> > Comment on attachment 8511203 [details] [diff] [review]
> > bug-1078645-v3.patch
> > @@ +594,5 @@
> > > +            cmd = ' '.join(proc.cmdline())
> > > +            if (cmd and 'emulator' in proc.name() and ' -avd ' in cmd):
> > > +                if port_arg in cmd:
> > > +                    # Kill the emulator if it is not running with gpu
> > > +                    if self.gpu_flag not in cmd:
> > 
> > Similar concern here: I don't think we need to require -avd and -gpu. I
> > prefer to say "start the emulator in such a way that you can run fennec --
> > and check that it can!" and keep the code flexible.
> > 
> I want to make sure we don't *ever* accidentally identify a process as an
> emulator process just because it happens to have 'emulator' in the command
> line and the port number somewhere in the command line.
 ...
> We can drop the -avd requirement though.

OK. Let's drop -avd and keep the rest.

> Of course if we spent some time making sure we match the command line
> arguments against the psutil command line list we could drive from there. It
> just seemed to be fraught with complications of which there seemed to be
> enough already.

I think we talked about this on irc and concluded that we should match the command line, but today I don't see the value and would prefer to avoid the pitfalls.
 
> > @@ +1763,5 @@
> > > +
> > > +        self._logger.debug('ADBDevice._start_emulator: pid: %s, name: %s' % (
> > > +            self._emulator_proc.pid, self._emulator_proc.name()))
> > > +
> > > +    def stop_emulator(self):
> > 
> > I am offended by the asymmetry: _start vs stop.
> > 
> 
> Heh. I did the public stop along with the public is_emulator so I could call
> it from the worker to stop it before terminating the worker thread during my
> attempts to fix that shutdown process not found error. 
> 
> If you want to revert those changes, I'm fine with that. I almost removed
> _start altogether since it really is a one line call to Popen....

No - let's leave this as it is.
 
> > @@ +1778,5 @@
> > > +            self.command_output(["reboot"], timeout=1.0)
> > 
> > I have mixed feelings about this: I'm not sure that it does anything useful.
> > 
> The reboot part? Me too. I almost removed it. Let's go ahead and drop it.

OK - drop the reboot.
 
> > You didn't like "adb emu kill"? It seemed like the civilized way to bring
> > down the emulator...but this is simpler.
> 
> I did that while I was chasing down zombies. It seemed that if we had the
> process already, we should use it.

Agreed!
...v3 with those minor changes.
Attachment #8508956 - Attachment is obsolete: true
Attachment #8511203 - Attachment is obsolete: true
This fixes the process error on shutdown. It applies on top of the v4 patch. I'll attach a full v5 patch.
Attached patch bug-1078645-v5.patch (obsolete) — Splinter Review
This works pretty well for me in testing so far. gbrown, can you put it through some tests and see what you think?
notes:
* running emulators fails on OS X 10.9.5 with psutil.AccessDenied errors.
* the emulator command needs to have repeated spaces reduced to one.
Comment on attachment 8513488 [details] [diff] [review]
bug-1078645-v5.patch

Review of attachment 8513488 [details] [diff] [review]:
-----------------------------------------------------------------

This works great for me. I can't break it and the multiprocessing error is gone.
great. I'll see about the osx issue and update the docs. I think we'll be ok even without an osx resolution at the moment, but it would be good to get if we can easily do it.
Attached patch bug-1078645-v6.patch (obsolete) — Splinter Review
In addition to adding support for emulators, this fixes some long standing issues related to the multiprocessing and child processes. This will hopefully eliminate the need to manually kill adb and autophone.py when restarting after terminating via Control-C as well as help with some of the adb issues that may have been due to orphaned or zombied processes.

gbrown, if you could test this as well I would appreciate it.

mcote; another monster patch. Sorry. You can always delegate to gbrown or wlach. ;-)

wlach: This isn't quite ready for uplifting to adbdevice.adb but I think it would be a good addition. You can save detailed review comments for when we do uplift but if you want to provide some feedback now that would be appreciated.

Tested on Linux and OSX 10.9.5

Added new command line option to specify adb_timeout for
ADBDevice timeout.

Updated INSTALL.md to reflect emulator lessons learned.

Cleaned up autophone, server, buildcache server start, stop to
better deal with exceptions during initialization and to localize
termination processing.

Separated initial use of ADBDevice in autophone.py from that in
worker.py so that the ADBDevice instance created in autophone.py is stopped on the main process before the ADBDevice instance in the worker is created in
the the worker's process.

worker.py can not be a multiprocessing daemon, since it starts
processes such as adb, unit tests and emulators. Since worker.py
is no longer a daemon, we need to make sure our child
processes are cleaned up. Introduced processcleaner.py which will
track registered processes and kill/reap them as needed.

Note it is now possible to stop/start the adb server on both
Linux and OS X.

Better handle case where worker stop fails when multiprocessing
loses the worker subprocess or otherwise thinks the worker is not
alive but the process still exists.

Added emulator support. Used a dedicated _reap_emulator method to
make certain that emulator processes are cleaned up immediately in
order that a new emulator process is not started while an old
emulator holds onto the disk images.

Note that we simply allow the emulator process to write to
stdout/stderr without trying to capture it to a logfile. Trying
to asynchronously log stdout/stderr from the emulator would
require threading and was deemed too risky with too little
benefit for now.
Attachment #8516930 - Flags: review?(mcote)
Attachment #8516930 - Flags: review?(gbrown)
Attachment #8516930 - Flags: feedback?(wlachance)
Assignee: gbrown → bclary
Status: NEW → ASSIGNED
Comment on attachment 8516930 [details] [diff] [review]
bug-1078645-v6.patch

Indeed, this is a perfect opportunity to get gbrown further up to speed on Autophone. ;)
Attachment #8516930 - Flags: review?(mcote)
Comment on attachment 8516930 [details] [diff] [review]
bug-1078645-v6.patch

Review of attachment 8516930 [details] [diff] [review]:
-----------------------------------------------------------------

::: INSTALL.md
@@ +148,5 @@
>  than the emulator battery charge (eg. device_battery_min=40).
>  
>  Also be aware that instead of rebooting the device, autophone will kill
> +and restart the emulator. When using an emulator, be sure to specify
> +"-port <port>" and "-gpu on" when starting each emulator instance; this

We already basically said this, above.

@@ +158,5 @@
> +that you stop all running emulators before attempting to run Autophone
> +using emulators.
> +
> +On OS X when the emulator is restarted in the worker subprocess, the
> +firewall may display a dialog to asking allow incoming connections for

s/to asking allow/asking to allow/ ?

@@ +160,5 @@
> +
> +On OS X when the emulator is restarted in the worker subprocess, the
> +firewall may display a dialog to asking allow incoming connections for
> +the python process. This can cause problems if it is not allowed
> +quickly enough. If you have start up problems, you may try disabling

s/you may try/consider/ ?

::: adb.py
@@ +1854,5 @@
> +            raise ADBError('Is not an emulator.')
> +
> +        # Kill the emulator and check/wait for emulator termination
> +        if self._emulator_proc and self._emulator_proc.is_running():
> +                self._logger.debug('ADBDevice.stop_emulator: emu kill %s' % self._emulator_proc)

excessive indent in this block

@@ +1948,5 @@
> +            status = None
> +            self._logger.exception('ADBDevice._reap_emulator: ')
> +
> +        if is_running and status == psutil.STATUS_ZOMBIE:
> +            self._logger.debug('ADBDevice._reap_emulator: Vampire Zombie!')

maybe something more descriptive would be helpful here :)

@@ +1949,5 @@
> +            self._logger.exception('ADBDevice._reap_emulator: ')
> +
> +        if is_running and status == psutil.STATUS_ZOMBIE:
> +            self._logger.debug('ADBDevice._reap_emulator: Vampire Zombie!')
> +            raise ADBError('ADBDevice: commiting suicide to free the zombie!')

s/commiting/committing/

::: adb_android.py
@@ +147,5 @@
> +        if not self._is_emulator:
> +            self.command_output(["reboot"], timeout=timeout)
> +        else:
> +            # adb reboot fails on the Android emulator, but stopping
> +            # emulator abrubtly can potentially cause issues. Issue

s/abrubtly/abruptly/

I have never noticed a problem running without the reboot. Is a 30 second delay warranted?
(In reply to Geoff Brown [:gbrown] from comment #32)
> Comment on attachment 8516930 [details] [diff] [review]
> bug-1078645-v6.patch
> 
> Review of attachment 8516930 [details] [diff] [review]:

Did you forget to r- or r+ ?

> I have never noticed a problem running without the reboot. Is a 30 second
> delay warranted?

probably not. I can cut it down to a very short period if you like. 5 seconds? Or remove it altogether?
(In reply to Bob Clary [:bc:] from comment #33)
> Did you forget to r- or r+ ?

No - I need to test more.

> > I have never noticed a problem running without the reboot. Is a 30 second
> > delay warranted?
> 
> probably not. I can cut it down to a very short period if you like. 5
> seconds? Or remove it altogether?

5 seconds sounds good.
I managed to get this on one run:

2014-11-05 17:16:54,203|INFO|AutoPhone terminated.
2014-11-05 17:16:54,203|INFO|Shutting down build-cache server...
2014-11-05 17:16:54,344|INFO|Waiting for downloads to complete...
2014-11-05 17:16:54,349|DEBUG|gbrown-emu|12235|ADBDevice.stop_emulator: emu kill failed, killing psutil.Popen(pid=12238, name='emulator')
2014-11-05 17:16:54,349|DEBUG|gbrown-emu|12235|ADBDevice._reap_emulator: process is gone.
2014-11-05 17:16:54,349|DEBUG|gbrown-emu|12235|processcleaner(12235,kill=True) killing psutil.Popen(pid=12238, name='emulator') pid=12238
2014-11-05 17:16:54,350|DEBUG|gbrown-emu|12235|processcleaner(12235,kill=True) killing psutil.Process(pid=12387 (terminated)) pid=12387
2014-11-05 17:16:54,350|ERROR|register_cmd:
Traceback (most recent call last):
  File "autophone.py", line 426, in register_cmd
    self.create_worker(phone)
  File "autophone.py", line 392, in create_worker
    worker.start()
  File "/home/gbrown/autophone05/worker.py", line 105, in start
    self.subprocess.start(phone_status)
  File "/home/gbrown/autophone05/worker.py", line 216, in start
    self.p.start()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 126, in __init__
    code = process_obj._bootstrap()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 276, in _bootstrap
    traceback.print_exc()
  File "/usr/lib/python2.7/traceback.py", line 232, in print_exc
    print_exception(etype, value, tb, limit, file)
  File "/usr/lib/python2.7/traceback.py", line 125, in print_exception
    print_tb(tb, limit, file)
  File "/usr/lib/python2.7/traceback.py", line 69, in print_tb
    line = linecache.getline(filename, lineno, f.f_globals)
  File "/usr/lib/python2.7/linecache.py", line 14, in getline
    lines = getlines(filename, module_globals)
  File "/usr/lib/python2.7/linecache.py", line 40, in getlines
    return updatecache(filename, module_globals)
  File "/usr/lib/python2.7/linecache.py", line 133, in updatecache
    lines = fp.readlines()
  File "/home/gbrown/autophone05/worker.py", line 714, in sigterm_handler
    sys.exit(15)
SystemExit: 15
2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12129|PhoneWorker:stop
2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12235|PhoneWorkerSubProcess:stop
2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12235|PhoneWorkerSubProcess:stop p.terminate()
2014-11-05 17:16:54,351|ERROR|gbrown-emu|12235|Terminating worker process 12235
Traceback (most recent call last):
  File "/home/gbrown/autophone05/worker.py", line 228, in stop
    self.p.terminate()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 137, in terminate
    self._popen.terminate()
AttributeError: 'NoneType' object has no attribute 'terminate'
Cool. Those are actually normal. I could suppress them to make it less confusing.

(In reply to Geoff Brown [:gbrown] from comment #35)
> I managed to get this on one run:
> 
> 2014-11-05 17:16:54,203|INFO|AutoPhone terminated.
> 2014-11-05 17:16:54,203|INFO|Shutting down build-cache server...
> 2014-11-05 17:16:54,344|INFO|Waiting for downloads to complete...
> 2014-11-05 17:16:54,349|DEBUG|gbrown-emu|12235|ADBDevice.stop_emulator: emu
> kill failed, killing psutil.Popen(pid=12238, name='emulator')

emu kill is marked as failing if the process is still running after the command. If it is, then I go ahead and kill it. I could try to wait a bit before checking to give it more time to update the process table.

> 2014-11-05 17:16:54,349|DEBUG|gbrown-emu|12235|ADBDevice._reap_emulator:
> process is gone.

That is just letting us know the process is no longer hanging around to cause problems.

> 2014-11-05
> 17:16:54,349|DEBUG|gbrown-emu|12235|processcleaner(12235,kill=True) killing
> psutil.Popen(pid=12238, name='emulator') pid=12238
> 2014-11-05
> 17:16:54,350|DEBUG|gbrown-emu|12235|processcleaner(12235,kill=True) killing
> psutil.Process(pid=12387 (terminated)) pid=12387

normal debug messages about killing emulators on shutdown.

> 2014-11-05 17:16:54,350|ERROR|register_cmd:
> Traceback (most recent call last):
>   File "autophone.py", line 426, in register_cmd
>     self.create_worker(phone)
>   File "autophone.py", line 392, in create_worker
>     worker.start()
>   File "/home/gbrown/autophone05/worker.py", line 105, in start
>     self.subprocess.start(phone_status)
>   File "/home/gbrown/autophone05/worker.py", line 216, in start
>     self.p.start()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
>     self._popen = Popen(self)
>   File "/usr/lib/python2.7/multiprocessing/forking.py", line 126, in __init__
>     code = process_obj._bootstrap()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 276, in
> _bootstrap
>     traceback.print_exc()
>   File "/usr/lib/python2.7/traceback.py", line 232, in print_exc
>     print_exception(etype, value, tb, limit, file)
>   File "/usr/lib/python2.7/traceback.py", line 125, in print_exception
>     print_tb(tb, limit, file)
>   File "/usr/lib/python2.7/traceback.py", line 69, in print_tb
>     line = linecache.getline(filename, lineno, f.f_globals)
>   File "/usr/lib/python2.7/linecache.py", line 14, in getline
>     lines = getlines(filename, module_globals)
>   File "/usr/lib/python2.7/linecache.py", line 40, in getlines
>     return updatecache(filename, module_globals)
>   File "/usr/lib/python2.7/linecache.py", line 133, in updatecache
>     lines = fp.readlines()
>   File "/home/gbrown/autophone05/worker.py", line 714, in sigterm_handler
>     sys.exit(15)
> SystemExit: 15

This whole stack is a result of forcing the worker to kill itself via sys.exit(15). We could make this nicer, less noisy.

> 2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12129|PhoneWorker:stop
> 2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12235|PhoneWorkerSubProcess:stop
> 2014-11-05 17:16:54,351|DEBUG|gbrown-emu|12235|PhoneWorkerSubProcess:stop
> p.terminate()

The order here is a little confusing. The exception AttributeError: 'NoneType' object has no attribute 'terminate' happens first. For some reason, both before the emulator patch and even after, multiprocessing loses track of the worker process sometimes during terminate.

> 2014-11-05 17:16:54,351|ERROR|gbrown-emu|12235|Terminating worker process
> 12235

This actually happens after the AttributeError: 'NoneType' object has no attribute 'terminate' and means that even though multiprocessing lost track of the worker process, we are going to kill it ourselves so it doesn't become and orphan.

> Traceback (most recent call last):
>   File "/home/gbrown/autophone05/worker.py", line 228, in stop
>     self.p.terminate()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 137, in
> terminate
>     self._popen.terminate()
> AttributeError: 'NoneType' object has no attribute 'terminate'

I could make this less noisy, but it was helpful in figuring out what the hell multiprocessing was doing.

The real test is after autophone stops. Does 

ps -e -o pid,ppid,command|grep -E '(autophone.py|adb|emulator)'

show any orphaned or zombied processes?
Comment on attachment 8516930 [details] [diff] [review]
bug-1078645-v6.patch

Review of attachment 8516930 [details] [diff] [review]:
-----------------------------------------------------------------

No orphans, no zombies. And I can still run tests on the emulator.
Attachment #8516930 - Flags: review?(gbrown) → review+
(In reply to Bob Clary [:bc:] from comment #36)
> Cool. Those are actually normal. I could suppress them to make it less
> confusing.

I wouldn't worry much about suppressing those messages. If I wasn't testing shutdown, I probably would not have been concerned. If anything, I would look at reducing the severity of the ERROR messages.
I'll make them less ERRORy and perhaps remove the sys.exit from the worker sigterm handler since it is probably redundant. I probably have a number of belt+suspender combinations I introduced trying to catch all of the cases where we left orphans and zombies. A main pain point for me in the past even before the emulator work was having to kill autophone.py or adb in order to free up the ports used by autophone in order to start it again. I felt this would be a major pita to developers trying to use autophone.

I was thinking the multiprocessing terminate error is a long standing issue and is probably indicative of a problem with restarting a dead worker and not keeping the multiprocessing process updated. I'll update the patch with your suggestions and we can work on that later.

Thanks!
Attachment #8511204 - Attachment is obsolete: true
Attachment #8512291 - Attachment is obsolete: true
Attachment #8513487 - Attachment is obsolete: true
Attachment #8513488 - Attachment is obsolete: true
Comment on attachment 8516930 [details] [diff] [review]
bug-1078645-v6.patch

This continues to have problems with orphaned processes on osx. I'm going to put this on hold for now.
Attachment #8516930 - Flags: review+
Attachment #8516930 - Flags: feedback?(wlachance)
Attachment #8516930 - Attachment is obsolete: true
work in process, not ready for prime time.
Blocks: 1147918
Assignee: bob → nobody
No longer blocks: 1147918
Status: ASSIGNED → NEW
No longer blocks: 1036773
Autophone is going away. Resolving these to wontfix.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: