Closed Bug 624001 Opened 9 years ago Closed 9 years ago

devicemanager.py needs to be updated for all branches and it needs more robust error handling

Categories

(Testing :: General, defect)

x86
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(14 files, 11 obsolete files)

1.95 KB, patch
mcote
: review+
Details | Diff | Splinter Review
36.45 KB, patch
cmtalbert
: review+
Details | Diff | Splinter Review
43.86 KB, patch
mcote
: review+
jmaher
: feedback-
Details | Diff | Splinter Review
552 bytes, patch
bear
: review+
Details | Diff | Splinter Review
693 bytes, patch
mcote
: review+
Details | Diff | Splinter Review
641 bytes, patch
mcote
: review+
Details | Diff | Splinter Review
48.39 KB, patch
anodelman
: review+
Details | Diff | Splinter Review
765 bytes, patch
cmtalbert
: review+
Details | Diff | Splinter Review
4.31 KB, patch
cmtalbert
: review+
Details | Diff | Splinter Review
1.83 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
5.27 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
2.92 KB, patch
mcote
: review+
Details | Diff | Splinter Review
4.13 KB, patch
bear
: review+
Details | Diff | Splinter Review
4.72 KB, patch
mcote
: review+
Details | Diff | Splinter Review
currently devicemanager.py is out of sync between remote-testing, mozilla-central and talos.  This is trivial to get them to the same code base.  

Another work item is to change devicemanger.py to have a more robust and documented error handling system.  Part of this is a retry limit for connecting to the device (i.e. dead device and scripts continue on).  This is an attempt to put a framework in place to cleanup return values and error handling in general.
Attachment #502078 - Flags: review?(mcote)
first patch uploaded, another patch will be there after some more testing.
Comment on attachment 502078 [details] [diff] [review]
remote-testing sync patch to other branches (1.0)

Yup yup.
Attachment #502078 - Flags: review?(mcote) → review+
this patch is the second patch in a series which adds a lot of devicemanager error handling.  This requires the remote automation to be cleaned up a bit, but is a lot more robust.

Although this applied to talos, both talos and remote-testing share the same devicemanager.py.
Assignee: nobody → jmaher
Attachment #502129 - Flags: review?(mcote)
I should have put a remote-testing version up first.  This is the most familiar for you mcote and will allow us to get it checked in easier.  I have verified this with the unit tests and it also works for talos.
Attachment #502129 - Attachment is obsolete: true
Attachment #502470 - Flags: review?(mcote)
Attachment #502129 - Flags: review?(mcote)
Comment on attachment 502470 [details] [diff] [review]
remote-testing devicemanager error handling (1.0)

Looks a lot more robust now with clearer failures... I didn't see any big problems; some comments below.


+  def sendCMD_loop(self, cmdline, newline = True):

The name of this function is a bit confusing; to me, it sounds like this function is the one doing the retry loop.  But maybe that's just me.

+  # external function
+  # returns:
+  #  success: True
+  #  failure: False #bah, I would rather see other than true/false return values
   def pushFile(self, localname, destname):

You could, later perhaps, implement an object variable called something like 'lasterr' that is set to a standard error code from the 'errno' module when there's an error; that wouldn't require any further changes to external interfaces--and it could be set in the appropriate function where the error is detected (socket errors in sendCMD(), file-system errors in pushFile() and similar functions, etc.).  For another patch if we felt like doing it.

   # make directory structure on the device
+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None

I guess this isn't new, but the success return value is strange.  Why is it not the directory name, like in mkDir() above?  Same with pushDir().

+    if (self.agentErrorRE.match(data)):
+      return []

After seeing this over and over, I can't help but think that perhaps this would be better placed in sendCMD(), maybe with an optional parameter to enable the check... it could throw DMError, with an appropriate message.

   def removeDir(self, remoteDir):
-    self.sendCMD(['rmdr ' + remoteDir])
+    try:
+      retVal = self.sendCMD(['rmdr ' + remoteDir])
+    except:
+      return None

This should probably be looking for DMError and not a bare except clause.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def poll(self, process):

Another weird success value.

     procre = re.compile('.*' + app + '.*')
 
     procList = self.getProcessList()
-    if (procList == None):
+    if (procList == []):
       return ''

Since getProcessList() now returns an empty list instead of None, technically this check is unneeded, since the for loop that follows will not be executed at all if procList is an empty list, so the returned pid will still be ''.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def killProcess(self, appname):

Weird success value.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def reboot(self, ipAddr=None, port=30000):

Ditto.

So yeah, maybe a few things to clean up, but generally good.
Attachment #502470 - Flags: review?(mcote) → review+
I actually got all the changes requested, but enough changed that it should have somebody sanity check it.
Attachment #502470 - Attachment is obsolete: true
Attachment #502550 - Flags: review?(mcote)
ok, for final review
Attachment #502550 - Attachment is obsolete: true
Attachment #502589 - Flags: review?(mcote)
Attachment #502550 - Flags: review?(mcote)
fixed bogus = to be ==
Attachment #502589 - Attachment is obsolete: true
Attachment #502602 - Flags: review?(mcote)
Attachment #502589 - Flags: review?(mcote)
Comment on attachment 502602 [details] [diff] [review]
remote-testing devicemanager error handling (1.3)


+  # external function
+  # returns:
+  #  success: array of process tuples
+  #  failure: []
   def getProcessList(self):
-    data = self.sendCMD(['ps'])
-    if (data == None):
-      return None
-      
+    try:
+      data = self.sendCMD(['ps'])
+    except:
+      return []
+

Hm I might have missed this earlier, but that should probably not be a bare except.

Everything else looks good, except there's one thing that concerns me: the only call to _sendCMD() is from sendCMD(), so ignoreAgentErrors in _sendCMD() is never being set to False.  Are there in fact no functions that actually need to bypass the error checking then?  That's the only reason I suggested making it optional: in the original code, there were various calls to sendCMD() that didn't check the returned data for agentErrorRE matches.  But now they all check, since they call sendCMD() which calls _sendCMD() with ignoreAgentErrors being True.

Waiting on r+ 'til I hear back...
Sorry, slight modification to what I said: there are no calls to sendCMD() that actually do *check* for agentErrorRE matches, since ignoreAgentErrors is always True.  My thoughts were to have sendCMD() call _sendCMD() with ignoreAgentErrors set to False, since normally we want to check for the error string, and any functions that need to ignore agent errors call _sendCMD() directly.
:mcote, could you tie up the loose ends on this patch and get :ctalbert to r+ it?  I would like to keep this ball rolling and at the very least get this into r-t and m-c this week.  I can do the talos stuff next week!
Sure will do.  I'm not entirely sure which functions should have the agentErrorRE checks and which shouldn't, so, unless it's obvious, I'll just fix the patch so that the functions that used to check it will now call the version of sendCMD() that does the check.  We can add more later if necessary.
Fixes on top of jmaher's last patch:

- new wrapper function is now called verifySendCMD() and old function is still sendCMD(), so that any functions not explicitly calling the former get the old functionality (ignoring agent errors)
- fixed wrapper function to return result from sendCMD()
- replaced all existing agentErrorRE checks and those added in jmaher's first patch with called to verifySendCMD()
- changed a bare "except" to "except DMError"

All unit tests pass except test_inst, which has problems unrelated to the device manager.
Attachment #502602 - Attachment is obsolete: true
Attachment #502866 - Flags: review?(ctalbert)
Attachment #502602 - Flags: review?(mcote)
Comment on attachment 502866 [details] [diff] [review]
remote-testing devicemanager error handling (1.4)

I like the purpose and cleanup done in this patch. It's been needed for a while.  I do have some concerns though:
>diff --git a/tests/devicemanager.py b/tests/devicemanager.py
>--- a/tests/devicemanager.py
>+++ b/tests/devicemanager.py
> class DeviceManager:
>   host = ''
>   port = 0
>   debug = 2 
>-  _redo = False
>+  retrylimit = 5
>+  retries = 0
>   deviceRoot = None
>   tempRoot = os.getcwd()
>   base_prompt = '$>'
>   base_prompt_re = '\$\>'
>   prompt_sep = '\x00'
>   prompt_regex = '.*(' + base_prompt_re + prompt_sep + ')'
>   agentErrorRE = re.compile('^##AGENT-WARNING##.*')
> 
>   def __init__(self, host, port = 20701):
I thought that Aki had specifically requsted that we make the retrylimit configurable?  I understand that the user of DeviceManager could do something like: dm.retrylimit = 11, but it would be nicer (in my mind) to inlcude this option as a parameter on the contstuctor.
It could still default to 5 of course.

>+  def _doCMD(self, cmdline, newline = True):
>     promptre = re.compile(self.prompt_regex + '$')
>     data = ""
>     shouldCloseSocket = False
>     recvGuard = 1000
> 
>     for cmd in cmdline:
>       if newline: cmd += '\r\n'
>       
>       try:
>+        self.retries = 0
This is kind of scary.  So sendCMD calls doCMD in the "retry" while loop.  However, doCMD resets self.retries to 0.
I think what you're doing here is ensuring that each command on the command line is retried "retrylimit" times.  It kind of makes my skin crawl to set the outer loop's control variable from within the funciton being called.  Perhaps a comment here so we'll remember why this is here in the future?  I don't see any better way to do this.
One way this could fail pretty spectacularly though is the following scenario:
1. PushFile <someHugeFile>
2. Push turns into two "commands" - push /sdcard/myhugefile.txt <file contents>
3. If the <file contents> side of the command fails to go through, then how do we know what will get retried?  Ideally the entire "push" command should fail and be retried as a unit.  I don't want to find out that we're retrying the push code and making the agent append file bits to itself a number of times.
Not sure how to test this unforutnately to see would happen in that failure case.

>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: False #bah, I would rather see other than true/false return values
What else would you do?  This is a lot better than returning '' vs None, you have to admit.
If there's something better, then why aren't we doing it? FWIW, I like the True/False return on this function.
>   def pushFile(self, localname, destname):
 
>   # push localDir from host to remoteDir on the device
>+  # external function
>+  # returns:
>+  #  success: remoteDir
>+  #  failure: None
>   def pushDir(self, localDir, remoteDir):
>-    if (self.debug >= 2): print "pushing directory: " + localDir + " to " + remoteDir
>+    if (self.debug >= 2): print "pushing directory: %s to %s" % (localDir, remoteDir)
>     for root, dirs, files in os.walk(localDir):
>       parts = root.split(localDir)
>       for file in files:
>         remoteRoot = remoteDir + '/' + parts[1]
>         remoteName = remoteRoot + '/' + file
>         if (parts[1] == ""): remoteRoot = remoteDir
>         if (self.pushFile(os.path.join(root, file), remoteName) == None):
Push file is not going to return None.  It's going to return False ^^^
>+          #NOTE: we do not check the return value here.  This is a sanity check and the file could not exist
Is this referring to the removeFile below?  At this point, the push has failed the first time, and I don't think we care if the remove does or doesn't succeed.  We are just removing in case there was a partial push and we're going to try one more time.
>           self.removeFile(remoteName)
>           if (self.pushFile(os.path.join(root, file), remoteName) == None):
>             return None

>+  # external function
>+  # returns:
>+  #  success: pid
>+  #  failure: None
>   def fireProcess(self, appname):
>     if (self.debug >= 2): print "FIRE PROC: '" + appname + "'"
>     
>     if (self.processExist(appname) != ''):
>       print "WARNING: process %s appears to be running already\n" % appname
>     
>-    self.sendCMD(['exec ' + appname])
>+    try:
>+      data = self.verifySendCMD(['exec ' + appname])
>+    except(DMError):
>+      return None
> 
>-    #NOTE: we sleep for 30 seconds to allow the application to startup
>-    time.sleep(30)
>+    #NOTE: we sleep to allow the application to startup (it could take >3 seconds, but end in <15)
>+    timeslept = 0
>+    while (timeslept <= 30):
>+      self.process = self.processExist(appname)
>+      if (self.process is not ''):
>+        break
>+      time.sleep(3)
>+      timeslept += 3
I'm glad you've cleaned this up.  Much nicer.
> 
>-    self.process = self.processExist(appname)
>     if (self.debug >= 4): print "got pid: " + str(self.process) + " for process: " + str(appname)
>+    return self.process
If the process did not end up starting then self.process is '', not None as you claim you will return in your comment.  It looks like you intend for it to be None if self.ProcessExist finds that the process does not exist.  That change will require changes to a lot of upstream code, but I tend to think that since we're here doing this we should go ahead and fix this now.
We need to change the above if statement to look for None instead of '', and we need to change processExist further down in the code and file a follow up bug to change the rest of the scripts that depend on this API. 
We've already shot ourselves in the foot with this one a couple of times debugging a return of '' vs. None, so I think it'd be better to bite the bullet and change it now.
 
>+  # external function
>+  # returns:
>+  #  success: output filename
>+  #  failure: None
>   def launchProcess(self, cmd, outputFile = "process.txt", cwd = '', env = ''):
>-    self.fireProcess(cmdline)
>+    if self.fireProcess(cmdline) is None:
>+      return None
This will fail to capture a failure unless we change processExist, so more reason to do it now.

>+  #TODO: this might not be used
>   #hardcoded: sleep interval of 5 seconds, timeout of 10 minutes
>+  # external function
>+  # returns:
>+  #  success: [file contents, None]
>+  #  failure: [None, None]
I think this is used by some of the RProcess stuff in remote-automation.py
>   def communicate(self, process, timeout = 600):
>     interval = 5
>     timed_out = True
>     if (timeout > 0):
>       total_time = 0
>       while total_time < timeout:
>         time.sleep(interval)
>-        if (not self.poll(process)):
>+        if self.poll(process):
>           timed_out = False
>           break
This looks wrong.  Communicate should poll the process and only return when the process stops running or if the operation times out.  If the process stops running then timed_Out should be false and we should break out of the loop.  Here, if the process is running, you'll immediately break out of the loop, which is not what we want at all.

>         total_time += interval
> 
>     if (timed_out == True):
>-      return None
>+      return [None, None]
> 
>     return [self.getFile(process, "temp.txt"), None]
> 
> 
>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: None #None for legacy reasons
>   def poll(self, process):
>     try:
>       if (self.processExist(process) == ''):
>         return None
>-      return 1
>+      return True
>     except:
>       return None
>-    return 1
>+    return True
Why exactly does this function exist?  It's basically just calling processExist and chnging its return value to a True/None return value.
Let's just fix processExist and remove this function?

>   
>   # iterates process list and returns pid if exists, otherwise ''
>+  # external function
>+  # returns:
>+  #  success: pid
>+  #  failure: ''  #TODO: make this a failure of 'None' to match everything else
Yes, let's change that return value now and file a bug to change upstream consumers. Also change the above comment that explains the functionality so that it matches

>+  #
>+  # external function
>+  # returns:
>+  #  success: path for device root
>+  #  failure: None
>   def getDeviceRoot(self):
>     # This caching of deviceRoot is causing issues if things fail
>     # if (not self.deviceRoot):
Let's take that ^ line out.  We don't use self.deviceRoot anywhere else in the devicemanager and all our consumers call this function when they want it.  So let's just stop caching as an attribute on the class.
>-    data = self.sendCMD(['testroot'])
>-    if (data == None):
>-      return '/tests'
>+    try:
>+      data = self.verifySendCMD(['testroot'])
>+    except:
>+      return None
>+  
>     self.deviceRoot = self.stripPrompt(data).strip('\n') + '/tests'
I don't think there's any need to cache this on the class anymore ^ just do deviceRoot = <blah>
> 
>     if (not self.dirExists(self.deviceRoot)):
>-      self.mkDir(self.deviceRoot)
>+      if (self.mkDir(self.deviceRoot) == None):
>+        return None
> 
>     return self.deviceRoot
> 
>   # Either we will have /tests/fennec or /tests/firefox but we will never have
>   # both.  Return the one that exists
>+  # TODO: ensure we can support org.mozilla.firefox
See below
>+  # external function
>+  # returns:
>+  #  success: path for app root
>+  #  failure: None
>   def getAppRoot(self):
>-    if (self.dirExists(self.getDeviceRoot() + '/fennec')):
>-      return self.getDeviceRoot() + '/fennec'
>-    elif (self.dirExists(self.getDeviceRoot() + '/firefox')):
>-      return self.getDeviceRoot() + '/firefox'
>+    devroot = self.getDeviceRoot()
>+    if (devroot == None):
>+      return None
>+
>+    if (self.dirExists(devroot + '/fennec')):
>+      return devroot + '/fennec'
>+    elif (self.dirExists(devroot + '/firefox')):
>+      return devroot + '/firefox'
>     else:
>       return 'org.mozilla.fennec'
Replace the else with: 
elif (self.dirExsts('/data/data/org.mozilla.fennec')):
  return 'org.mozilla.fennec'
elif (self.dirExists('/data/data/org.mozilla.firefox')):
  return 'org.mozilla.firefox'
else:
  # Failure (either not installed or not a recognized platform)
  return None
Then we properly support Android.
  
>+  # external function
>+  # returns:
>+  #  success: status from test agent
>+  #  failure: None
>   def reboot(self, ipAddr=None, port=30000):
>     cmd = 'rebt'
> 
>     if (self.debug > 3): print "INFO: sending rebt command"
>     if (ipAddr is not None):
>       ip, port = self.getCallbackIpAndPort(ipAddr, port)
> 
>       # Set up our callback server
>       callbacksvr = callbackServer(ip, port, self.debug)
>-      data = self.sendCMD([cmd])
>+      try:
>+        data = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
>       status = callbacksvr.disconnect()
>     else:
>-      status = self.sendCMD([cmd])
>+      try:
>+        status = self.sendCMD([cmd])
>+      except(DMError):
>+        return None
>+
>     if (self.debug > 3): print "INFO: rebt- got status back: " + str(status)
>-
>-    return ''
>+    return status
The reboot function looks way complex.  Also, the case where we do not have an IP callback is using sendCMD not verifySendCMD
Why can't we:
if (ipAddr is not None):
  ip, port = self.getCallbackIpAndPort(ipAddr, port)
  callbacksvr = callbackServer(ip, port, self.debug)
try:
  status = self.verifySendCMD([cmd])
except(DMError):
  return None

if (ipAddr is not None):
  return(callbacksvr.disconnect())
else:
  return status

> 
>   # Returns information about the device:
>   # Directive indicates the information you want to get, your choices are:
>   # os - name of the os
>   # id - unique id of the device
>   # uptime - uptime of the device
>   # systime - system time of the device
>   # screen - screen resolution
>   # memory - memory stats
>   # process - list of running processes (same as ps)
>   # disk - total, free, available bytes on disk
>   # power - power status (charge, battery temp)
>   # all - all of them - or call it with no parameters to get all the information
>+  # TODO: figure out the return values for this function
Returns a dict of lists, i.e. {'os': ['os info 1', 'os info 2',...],'uptime':['uptime info1', uptime info2',...]}
For failure it should just return an empty dict this way it matches the  behavior of other functions that return lists
>   def getInfo(self, directive=None):
>     data = None
>     result = {}
>     collapseSpaces = re.compile('  +')
> 
>     directives = ['os', 'id','uptime','systime','screen','memory','process',
>                   'disk','power']
>     if (directive in directives):
>@@ -819,42 +1065,55 @@ class DeviceManager:
>   """
>   Installs the application onto the device
>   Application bundle - path to the application bundle on the device
>   Destination - destination directory of where application should be
>                 installed to (optional)
>   Returns None for success, or output if known failure
>   TODO: we need a better way to know if this works or not
I think we can remove that ^ TODO: we have the better way to know if it works, we can do that by performing the update with the callbacksvr.
>   """
>   Uninstalls the named application from device and causes a reboot.
>   Takes an optional argument of installation path - the path to where the application
>   was installed.
>   Returns True, but it doesn't mean anything other than the command was sent,
>   the reboot happens and we don't know if this succeeds or not.
>   """
>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: False (TODO: consider changing this to not be true/false)
Why not go ahead and make this match the return behavior of installApp, so that it should return True on success and None on failure.

>   """
>   Updates the application on the device.
>   Application bundle - path to the application bundle on the device
>   Process name of application - used to end the process if the applicaiton is
>                                 currently running
>@@ -865,52 +1124,71 @@ class DeviceManager:
>   port - port to await a callback ping to let us know that the device has updated properly
>          defaults to 30000, and counts up from there if it finds a conflict
>   Returns True if succeeds, False if not
>   
>   NOTE: We have no real way to know if the device gets updated or not due to the
>         reboot that the udpate call forces on us.  We can't install our own heartbeat
>         listener here because we run the risk of racing with other heartbeat listeners.
We should remove this note now that we have a callback ability.
>   """
>+  # external function
>+  # returns:
>+  #  success: text status from command or callback server
>+  #  failure: None
>   def updateApp(self, appBundlePath, processName=None, destPath=None, ipAddr=None, port=30000):
>     status = None
>     if (ipAddr is not None):
>       ip, port = self.getCallbackIpAndPort(ipAddr, port)
> 
>       cmd += " %s %s" % (ip, port)
Interesting, does the reboot cmd use a different format if it is using the callback server?  We should check on that because in this patch the reboot command uses the same command whether or not a callback server is in place.
> 
>       # Set up our callback server
>       callbacksvr = callbackServer(ip, port, self.debug)
>-      data = self.sendCMD([cmd])
>+
>+      try:
>+        data = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
>+
>       status = callbacksvr.disconnect()
>     else:
>-      status = self.sendCMD([cmd])
>+      try:
>+        status = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
> 
>-    if (self.debug > 3): print "got status back: " + str(status)
>+    if (self.debug > 3): print "INFO: updateApp: got status back: " + str(status)
>+
>     return status
I feel like this function could have the same structure as the reboot method except that it changes the command run in the ipAddr case (and probably reboot should as well).  It just looks crazy to have such a small function with four different returns.
Attachment #502866 - Flags: review?(ctalbert) → review-
As discussed, this patch fixes the stuff you commented on, as well as a few more fixes and improvements.
Attachment #502866 - Attachment is obsolete: true
Attachment #503218 - Flags: review?(ctalbert)
Comment on attachment 503218 [details] [diff] [review]
remote-testing devicemanager error handling (1.5)

Looks great, as discussed on IRC let's change the mkDir and getInfo to verifySendCmd and then we'll be all done and ready to go.

Thanks for all the hard work!
Attachment #503218 - Flags: review?(ctalbert) → review+
Attached patch Patch for mcSplinter Review
This is the patch to update mozilla-central with the new devicemanager and also updates the affected uses of the changed API calls.
Attachment #503267 - Flags: review?(mcote)
Attached patch Patch for Talos (obsolete) — Splinter Review
Patch for Talos that updates devicemanager and the APIS that Talos was using that have now changed.
Attachment #503268 - Flags: review?(jmaher)
We realized during this code review that the devicemanager had parameters for ipaddress and port for a callback server on the devicemanager.reboot API.  However, the agent was not using these and the devicemanager sent the same command to the agent even if these parameters were specified.  To reduce confusion these parameters have been removed, which was the only API change that affected the tegra tools.  Patch created from Aki's user repo.
Attachment #503270 - Flags: review?(bear)
Attachment #503270 - Flags: review?(bear) → review+
Comment on attachment 503267 [details] [diff] [review]
Patch for mc

 
-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
             raise devicemanager.FileError("Failed to copy profiledir to device")
 
     def copyExtraFilesToProfile(self, options, profileDir):
         RefTest.copyExtraFilesToProfile(self, options, profileDir)
-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
             raise devicemanager.FileError("Failed to copy extra files to device") 

Hmm for better or for worse, pushDir() still returns None on failure.

         self.localProfile = options.profilePath
-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:

Ditto.

-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:

And again.

Aside from that it's all good.  I verified that, after applying the patch, devicemanager.py in build/mobile is that same as that in remote-testing/tests.
Attachment #503267 - Flags: review?(mcote) → review+
updated devicemanager to match all work done on mozilla-central and remote-testing.  Updated talos code to handle errors better.  

FYI: I haven't tested this patch as I don't have a setup on a remote device.  I tested an earlier version of this and there are a few changes (pushFile return value, removed poll and using processExist), but overall I feel comfortable that this patch is going to be ready for prime time.
Attachment #503268 - Attachment is obsolete: true
Attachment #503374 - Flags: review?(anodelman)
Attachment #503374 - Flags: feedback?(ctalbert)
Attachment #503268 - Flags: review?(jmaher)
Comment on attachment 503267 [details] [diff] [review]
Patch for mc

>diff --git a/layout/tools/reftest/remotereftest.py b/layout/tools/reftest/remotereftest.py
>-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
>+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
>             raise devicemanager.FileError("Failed to copy profiledir to device")

pushdir still returns None

> 
>     def copyExtraFilesToProfile(self, options, profileDir):
>         RefTest.copyExtraFilesToProfile(self, options, profileDir)
>-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
>+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
>             raise devicemanager.FileError("Failed to copy extra files to device") 
> 

ditto


>diff --git a/testing/mochitest/runtestsremote.py b/testing/mochitest/runtestsremote.py
>     def buildProfile(self, options):
>         manifest = Mochitest.buildProfile(self, options)
>         self.localProfile = options.profilePath
>-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
>+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:
>             raise devicemanager.FileError("Unable to copy profile to device.")

ditto

>         #we really need testConfig.js (for browser chrome)
>-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
>+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:
>             raise devicemanager.FileError("Unable to copy profile to device.")

ditto
Attachment #503267 - Flags: feedback-
Comment on attachment 503374 [details] [diff] [review]
patch for talos - devicemanager error handling and cleanup

The error handling looks good to me.  The only question I have is: if all callers to TerminateAllProcesses do not expect a return code, why did you add one?  And when you add one, you added a return -1 for the failure case, when it seems that everything else in Talos land returns 1 on failure and 0 on success.  

You now have TerminateAllProcess returning -1 on failure and None on success.  I'd at least make this match the other Talos APIs or not have it return anything at all rather than creating a third paradigm.

I'll defer to Alice for the final decision there, but that's my two cents.  Otherwise, the patch looks great.
Attachment #503374 - Flags: feedback?(ctalbert) → feedback+
for the return code from TerminateAllProcess, I added it so it is there if we need it.  I agree on the -1 vs 1 stuff, that should at least match what most other talos APIs do.  

I won't be able to run this in staging until next week, but I will do that first thing to get this ball rolling.
Attached patch fixing a mistake (obsolete) — Splinter Review
found an error on m-c.  it's also in the talos version.  Patch upcoming for talos too.
Attachment #503853 - Flags: review?(mcote)
Attachment #503853 - Attachment is obsolete: true
Attachment #503868 - Flags: review?(mcote)
Attachment #503853 - Flags: review?(mcote)
Comment on attachment 503868 [details] [diff] [review]
with 100% more content, sorry bout that

Great, good catch.
Attachment #503868 - Flags: review?(mcote) → review+
This should be applied atop attachment 503374 [details] [diff] [review], it fixes the same devicemanager typo for the talos tree.
Attachment #503955 - Flags: review?(mcote)
Comment on attachment 503955 [details] [diff] [review]
fixing the same devicemanager typo on talos

Yup yup!
Attachment #503955 - Flags: review?(mcote) → review+
I don't know if I'm the right person to handle devicemanager patches... I'm not very familiar with the code.  Might be more appropriate for bear/bmoss?
Alice, I intended for you to review the changes to the non devicemanager.py code.  Overall the devicemanager.py has had 3 people reviewing it over and over again.
I just ran this through talos staging and it is green.  Just need a sanity check review on the core talos files and we can get this landed.
I'm a little confused by this chunk:

     def TerminateAllProcesses(self, *process_names):
         """Helper function to terminate all processes with the given process name
 
         Args:
           process_name: String or strings containing the process name, i.e. "firefox"
         """
         for process_name in process_names:
             try:
-                self.testAgent.killProcess(process_name)
+                if (self.testAgent.killProcess(process_name) is None):
+                    # All code which call TerminateAllProcesses does not check for a return code
+                    return -1
             except:
                 # Might get an exception if there are no instances of the process running.
                 continue

Does this mean that we stop attempting to close processes upon finding a failed (None) killProcess?  Shouldn't we continue to close processes just for cleanliness?  Either that, or the comment is unclear as to what is happening.
updated patch to remove the early termination in TerminateAllProcesses inside ffprocess_remote.py.  The first patch went a bit overboard on error handling and that was unnecessary and doing the wrong thing.
Attachment #503374 - Attachment is obsolete: true
Attachment #504856 - Flags: review?(anodelman)
Attachment #503374 - Flags: review?(anodelman)
Attachment #504856 - Flags: review?(anodelman) → review+
one line fix to remote-testing repository for self.process->process.
Attachment #505221 - Flags: review?(ctalbert)
Duplicate of this bug: 618360
Attachment #505219 - Flags: review?(ctalbert) → review+
Comment on attachment 505221 [details] [diff] [review]
remote-testing devicemanager one line sync patch (1.0)

Looks like you have some changes here to the python agent in your diff as well.  I think those changes look fine (they all add functionality, not decrease functionality) so I'll r+ those too.
Attachment #505221 - Flags: review?(ctalbert) → review+
calls to removeDir() are getting a return of None but the output looks like it's working - see http://pastebin.mozilla.org/965728 for log output details
One-line fix in patch 503868 ("with 100% more content, sorry bout that") pushed to remote-testing as http://hg.mozilla.org/automation/remote-testing/rev/2be667ebf48e
Just found this one.  DeviceManager.isDir() isn't handling AGENT-WARNINGs correctly.  I wasn't able to reproduce such a warning (it isn't generated normally when there is no file or directory), but I saw it earlier when testing with my Fennec profile tool.  I also included an isDir unit test for completeness, though unfortunately it doesn't reproduce this particular bug.

This is a patch against remote-testing, btw.
Attachment #508578 - Flags: review?(jmaher)
Comment on attachment 508578 [details] [diff] [review]
Fixed error handling in isDir()

good catch here.  This appears solve one of the problems that :bear has seen on and off with removeDir.

For the unit test, I would like to see something like this added:
1) create dir
2) create dir/file
3) verify dir
4) verify dir/file
5) delete dir
6) verify dir == false
7) verify dir/file == false

r=me with consideration to additional unit tests
Attachment #508578 - Flags: review?(jmaher) → review+
Okay the test already does most of that; I just added 5, 6, & 7.  Though step 4 should be verify dir/file == false, since isDir() should return False for files.

Pushed as http://hg.mozilla.org/automation/remote-testing/rev/cb81f84f53c4
Here's another problem: getDirectory() aborts if it finds an empty directory.  It shouldn't.

Includes unit test (but no batteries).
Attachment #508870 - Flags: review?(jmaher)
This preserves the old behaviour in which None is returned if the directory doesn't exist but still creates the directory if it exists but is empty.  Added an optional parameter so that the recursive call doesn't call isDir() twice.
Attachment #508870 - Attachment is obsolete: true
Attachment #508890 - Flags: review?(jmaher)
Attachment #508870 - Flags: review?(jmaher)
Comment on attachment 508890 [details] [diff] [review]
Fixed getDirectory() with empty dirs

this looks good, thanks mcote
Attachment #508890 - Flags: review?(jmaher) → review+
another patch as a request from releng to have reboot wait until the device comes online before returning.
Attachment #508901 - Flags: review?(mcote)
Comment on attachment 508901 [details] [diff] [review]
add a wait in reboot until device comes back up (1.0)

Looks good, though you could tighten up that loop like so:

    if (wait == True):
      #this sleeps up to 5 minutes in 30 second intervals
      count = 0
      while (count < 10):
        if (self.debug >= 4): print "DEBUG: sleeping 30 seconds while waiting for reboot"
        time.sleep(30)
        waitstatus = self.getDeviceRoot()
        if (waitstatus is not None):
          break
        self.retries = 0
        count += 1

That gets rid of the duplicate code and should work the same way, since you are doing the sleep right away.

Is it not an error for it to time out?  We don't need to raise an exception if count == 10?

r+ though I would like to hear the answer to my last question.
Attachment #508901 - Flags: review?(mcote) → review+
asking for review since I have unit tests to add here.  To answer the questions from previous review, I have simplified the code as mentioned.  As for the error condition being an exception vs a return code, it is preferable to work with existing scripts to be a return value, not an exception.
Attachment #508901 - Attachment is obsolete: true
Attachment #509150 - Flags: review?(mcote)
Comment on attachment 509150 [details] [diff] [review]
add a wait in reboot until device comes back up (2.0)

The modifications to the reboot() command look good.  Just a couple comments about the unit test:

+        start = datetime.datetime.now()
+        info = self.dm.getInfo('uptime')
+        end = datetime.datetime.now()

On my system the first half of the unit test failed because, at some point during the reboot cycle, the reconnect attempts failed immediately, so it ran out of retries before the device came back up.  I put the getInfo() call in a loop, and it worked.  Here's what I did:

        start = datetime.datetime.now()
        count = 0
        while count < 5:
            try:
                info = self.dm.getInfo('uptime')
            except devicemanager.DMError:
                time.sleep(10)
                count += 1
            else:
                break
        self.assertTrue(count < 5)
        end = datetime.datetime.now()

This should also work for your case, in which the getInfo() command hangs until the device is back up.  Note that you'll also need an "import devicemanager" at the top of the file.  Please try this out on your device.

+        if (not matches):
+            print "error getting uptime information"

Might as well replace these two lines with

        self.assertTrue(matches, "error getting uptime information")

since, if matches == None, the test will throw an exception immediately after this anyway when it starts trying to access matches.group().

So yeah, try the unit test with my modification on your system, and if it works, we're good here.
Attachment #509150 - Flags: review?(mcote) → review+
all of these changes are on remote-testing, porting to talos.
Attachment #509473 - Flags: review?(ctalbert)
Attachment #509473 - Flags: review?(ctalbert) → review?(bear)
Attachment #509473 - Flags: review?(bear) → review+
update mozilla-central with remote-testing and talos changes to devicemanager.py
Attachment #509494 - Flags: review?(mcote)
Comment on attachment 509494 [details] [diff] [review]
mozilla-central update for reboot, getDirectory and isdir (1.0)

Looks good, and after applying this patch devicemanager.py is the same in all 3 locations.  Yay!
Attachment #509494 - Flags: review?(mcote) → review+
Component: New Frameworks → General
You need to log in before you can comment on or make changes to this bug.