Closed Bug 624001 Opened 14 years ago Closed 13 years ago

devicemanager.py needs to be updated for all branches and it needs more robust error handling

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(14 files, 11 obsolete files)

remote-testing sync patch to other branches (1.0) 14 years ago Joel Maher ( :jmaher ) (UTC -8) 1.95 KB, patch	mcote : review+	Details \| Diff \| Splinter Review
talos devicemanager error handling 14 years ago Joel Maher ( :jmaher ) (UTC -8) 37.65 KB, patch		Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.0) 14 years ago Joel Maher ( :jmaher ) (UTC -8) 33.76 KB, patch	mcote : review+	Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.1) 14 years ago Joel Maher ( :jmaher ) (UTC -8) 33.37 KB, patch		Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.2) 14 years ago Joel Maher ( :jmaher ) (UTC -8) 33.44 KB, patch		Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.3) 14 years ago Joel Maher ( :jmaher ) (UTC -8) 33.44 KB, patch		Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.4) 14 years ago Mark Côté [:mcote] 33.59 KB, patch	cmtalbert : review-	Details \| Diff \| Splinter Review
remote-testing devicemanager error handling (1.5) 14 years ago Mark Côté [:mcote] 36.45 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
Patch for mc 14 years ago cmtalbert 43.86 KB, patch	mcote : review+ jmaher : feedback-	Details \| Diff \| Splinter Review
Patch for Talos 14 years ago cmtalbert 41.40 KB, patch		Details \| Diff \| Splinter Review
Update Tegra Tools DeviceManager APIs 14 years ago cmtalbert 552 bytes, patch	bear : review+	Details \| Diff \| Splinter Review
patch for talos - devicemanager error handling and cleanup 14 years ago Joel Maher ( :jmaher ) (UTC -8) 49.30 KB, patch	cmtalbert : feedback+	Details \| Diff \| Splinter Review
fixing a mistake 13 years ago cmtalbert 19 bytes, patch		Details \| Diff \| Splinter Review
with 100% more content, sorry bout that 13 years ago cmtalbert 693 bytes, patch	mcote : review+	Details \| Diff \| Splinter Review
fixing the same devicemanager typo on talos 13 years ago cmtalbert 641 bytes, patch	mcote : review+	Details \| Diff \| Splinter Review
patch for talos - devicemanager error handling and cleanup (1.1) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 48.39 KB, patch	anodelman : review+	Details \| Diff \| Splinter Review
mozilla-central patch for one line change that came from talos (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 765 bytes, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
remote-testing devicemanager one line sync patch (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 4.31 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
Fixed error handling in isDir() 13 years ago Mark Côté [:mcote] 1.83 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Fixed getDirectory() with empty dirs 13 years ago Mark Côté [:mcote] 3.78 KB, patch		Details \| Diff \| Splinter Review
Fixed getDirectory() with empty dirs 13 years ago Mark Côté [:mcote] 5.27 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
add a wait in reboot until device comes back up (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 1.56 KB, patch	mcote : review+	Details \| Diff \| Splinter Review
add a wait in reboot until device comes back up (2.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 2.92 KB, patch	mcote : review+	Details \| Diff \| Splinter Review
update talos with reboot, getDirectory and isfile changes (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 4.13 KB, patch	bear : review+	Details \| Diff \| Splinter Review
mozilla-central update for reboot, getDirectory and isdir (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 4.72 KB, patch	mcote : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Description

•

14 years ago

Attached patch remote-testing sync patch to other branches (1.0) — Details — Splinter Review

currently devicemanager.py is out of sync between remote-testing, mozilla-central and talos.  This is trivial to get them to the same code base.  

Another work item is to change devicemanger.py to have a more robust and documented error handling system.  Part of this is a retry limit for connecting to the device (i.e. dead device and scripts continue on).  This is an attempt to put a framework in place to cleanup return values and error handling in general.

Attachment #502078 - Flags: review?(mcote)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 1

•

14 years ago

first patch uploaded, another patch will be there after some more testing.

Mark Côté [:mcote]

Comment 2

•

14 years ago

Comment on attachment 502078 [details] [diff] [review]
remote-testing sync patch to other branches (1.0)

Yup yup.

Attachment #502078 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 3

•

14 years ago

Attached patch talos devicemanager error handling (obsolete) — Details — Splinter Review

this patch is the second patch in a series which adds a lot of devicemanager error handling.  This requires the remote automation to be cleaned up a bit, but is a lot more robust.

Although this applied to talos, both talos and remote-testing share the same devicemanager.py.

Assignee: nobody → jmaher

Attachment #502129 - Flags: review?(mcote)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 4

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.0) (obsolete) — Details — Splinter Review

I should have put a remote-testing version up first.  This is the most familiar for you mcote and will allow us to get it checked in easier.  I have verified this with the unit tests and it also works for talos.

Attachment #502129 - Attachment is obsolete: true

Attachment #502470 - Flags: review?(mcote)

Attachment #502129 - Flags: review?(mcote)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

14 years ago

http://hg.mozilla.org/automation/remote-testing/pushloghtml?changeset=66e3f6c74203

Mark Côté [:mcote]

Comment 6

•

14 years ago

Comment on attachment 502470 [details] [diff] [review]
remote-testing devicemanager error handling (1.0)

Looks a lot more robust now with clearer failures... I didn't see any big problems; some comments below.


+  def sendCMD_loop(self, cmdline, newline = True):

The name of this function is a bit confusing; to me, it sounds like this function is the one doing the retry loop.  But maybe that's just me.

+  # external function
+  # returns:
+  #  success: True
+  #  failure: False #bah, I would rather see other than true/false return values
   def pushFile(self, localname, destname):

You could, later perhaps, implement an object variable called something like 'lasterr' that is set to a standard error code from the 'errno' module when there's an error; that wouldn't require any further changes to external interfaces--and it could be set in the appropriate function where the error is detected (socket errors in sendCMD(), file-system errors in pushFile() and similar functions, etc.).  For another patch if we felt like doing it.

   # make directory structure on the device
+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None

I guess this isn't new, but the success return value is strange.  Why is it not the directory name, like in mkDir() above?  Same with pushDir().

+    if (self.agentErrorRE.match(data)):
+      return []

After seeing this over and over, I can't help but think that perhaps this would be better placed in sendCMD(), maybe with an optional parameter to enable the check... it could throw DMError, with an appropriate message.

   def removeDir(self, remoteDir):
-    self.sendCMD(['rmdr ' + remoteDir])
+    try:
+      retVal = self.sendCMD(['rmdr ' + remoteDir])
+    except:
+      return None

This should probably be looking for DMError and not a bare except clause.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def poll(self, process):

Another weird success value.

     procre = re.compile('.*' + app + '.*')
 
     procList = self.getProcessList()
-    if (procList == None):
+    if (procList == []):
       return ''

Since getProcessList() now returns an empty list instead of None, technically this check is unneeded, since the for loop that follows will not be executed at all if procList is an empty list, so the returned pid will still be ''.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def killProcess(self, appname):

Weird success value.

+  # external function
+  # returns:
+  #  success: ''
+  #  failure: None
   def reboot(self, ipAddr=None, port=30000):

Ditto.

So yeah, maybe a few things to clean up, but generally good.

Attachment #502470 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 7

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.1) (obsolete) — Details — Splinter Review

I actually got all the changes requested, but enough changed that it should have somebody sanity check it.

Attachment #502470 - Attachment is obsolete: true

Attachment #502550 - Flags: review?(mcote)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.2) (obsolete) — Details — Splinter Review

ok, for final review

Attachment #502550 - Attachment is obsolete: true

Attachment #502589 - Flags: review?(mcote)

Attachment #502550 - Flags: review?(mcote)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.3) (obsolete) — Details — Splinter Review

fixed bogus = to be ==

Attachment #502589 - Attachment is obsolete: true

Attachment #502602 - Flags: review?(mcote)

Attachment #502589 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 10

•

14 years ago

Comment on attachment 502602 [details] [diff] [review]
remote-testing devicemanager error handling (1.3)


+  # external function
+  # returns:
+  #  success: array of process tuples
+  #  failure: []
   def getProcessList(self):
-    data = self.sendCMD(['ps'])
-    if (data == None):
-      return None
-      
+    try:
+      data = self.sendCMD(['ps'])
+    except:
+      return []
+

Hm I might have missed this earlier, but that should probably not be a bare except.

Everything else looks good, except there's one thing that concerns me: the only call to _sendCMD() is from sendCMD(), so ignoreAgentErrors in _sendCMD() is never being set to False.  Are there in fact no functions that actually need to bypass the error checking then?  That's the only reason I suggested making it optional: in the original code, there were various calls to sendCMD() that didn't check the returned data for agentErrorRE matches.  But now they all check, since they call sendCMD() which calls _sendCMD() with ignoreAgentErrors being True.

Waiting on r+ 'til I hear back...

Mark Côté [:mcote]

Comment 11

•

14 years ago

Sorry, slight modification to what I said: there are no calls to sendCMD() that actually do *check* for agentErrorRE matches, since ignoreAgentErrors is always True.  My thoughts were to have sendCMD() call _sendCMD() with ignoreAgentErrors set to False, since normally we want to check for the error string, and any functions that need to ignore agent errors call _sendCMD() directly.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 12

•

14 years ago

:mcote, could you tie up the loose ends on this patch and get :ctalbert to r+ it?  I would like to keep this ball rolling and at the very least get this into r-t and m-c this week.  I can do the talos stuff next week!

Mark Côté [:mcote]

Comment 13

•

14 years ago

Sure will do.  I'm not entirely sure which functions should have the agentErrorRE checks and which shouldn't, so, unless it's obvious, I'll just fix the patch so that the functions that used to check it will now call the version of sendCMD() that does the check.  We can add more later if necessary.

Mark Côté [:mcote]

Comment 14

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.4) (obsolete) — Details — Splinter Review

Fixes on top of jmaher's last patch:

- new wrapper function is now called verifySendCMD() and old function is still sendCMD(), so that any functions not explicitly calling the former get the old functionality (ignoring agent errors)
- fixed wrapper function to return result from sendCMD()
- replaced all existing agentErrorRE checks and those added in jmaher's first patch with called to verifySendCMD()
- changed a bare "except" to "except DMError"

All unit tests pass except test_inst, which has problems unrelated to the device manager.

Attachment #502602 - Attachment is obsolete: true

Attachment #502866 - Flags: review?(ctalbert)

Attachment #502602 - Flags: review?(mcote)

cmtalbert

Comment 15

•

14 years ago

Comment on attachment 502866 [details] [diff] [review]
remote-testing devicemanager error handling (1.4)

I like the purpose and cleanup done in this patch. It's been needed for a while.  I do have some concerns though:
>diff --git a/tests/devicemanager.py b/tests/devicemanager.py
>--- a/tests/devicemanager.py
>+++ b/tests/devicemanager.py
> class DeviceManager:
>   host = ''
>   port = 0
>   debug = 2 
>-  _redo = False
>+  retrylimit = 5
>+  retries = 0
>   deviceRoot = None
>   tempRoot = os.getcwd()
>   base_prompt = '$>'
>   base_prompt_re = '\$\>'
>   prompt_sep = '\x00'
>   prompt_regex = '.*(' + base_prompt_re + prompt_sep + ')'
>   agentErrorRE = re.compile('^##AGENT-WARNING##.*')
> 
>   def __init__(self, host, port = 20701):
I thought that Aki had specifically requsted that we make the retrylimit configurable?  I understand that the user of DeviceManager could do something like: dm.retrylimit = 11, but it would be nicer (in my mind) to inlcude this option as a parameter on the contstuctor.
It could still default to 5 of course.

>+  def _doCMD(self, cmdline, newline = True):
>     promptre = re.compile(self.prompt_regex + '$')
>     data = ""
>     shouldCloseSocket = False
>     recvGuard = 1000
> 
>     for cmd in cmdline:
>       if newline: cmd += '\r\n'
>       
>       try:
>+        self.retries = 0
This is kind of scary.  So sendCMD calls doCMD in the "retry" while loop.  However, doCMD resets self.retries to 0.
I think what you're doing here is ensuring that each command on the command line is retried "retrylimit" times.  It kind of makes my skin crawl to set the outer loop's control variable from within the funciton being called.  Perhaps a comment here so we'll remember why this is here in the future?  I don't see any better way to do this.
One way this could fail pretty spectacularly though is the following scenario:
1. PushFile <someHugeFile>
2. Push turns into two "commands" - push /sdcard/myhugefile.txt <file contents>
3. If the <file contents> side of the command fails to go through, then how do we know what will get retried?  Ideally the entire "push" command should fail and be retried as a unit.  I don't want to find out that we're retrying the push code and making the agent append file bits to itself a number of times.
Not sure how to test this unforutnately to see would happen in that failure case.

>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: False #bah, I would rather see other than true/false return values
What else would you do?  This is a lot better than returning '' vs None, you have to admit.
If there's something better, then why aren't we doing it? FWIW, I like the True/False return on this function.
>   def pushFile(self, localname, destname):
 
>   # push localDir from host to remoteDir on the device
>+  # external function
>+  # returns:
>+  #  success: remoteDir
>+  #  failure: None
>   def pushDir(self, localDir, remoteDir):
>-    if (self.debug >= 2): print "pushing directory: " + localDir + " to " + remoteDir
>+    if (self.debug >= 2): print "pushing directory: %s to %s" % (localDir, remoteDir)
>     for root, dirs, files in os.walk(localDir):
>       parts = root.split(localDir)
>       for file in files:
>         remoteRoot = remoteDir + '/' + parts[1]
>         remoteName = remoteRoot + '/' + file
>         if (parts[1] == ""): remoteRoot = remoteDir
>         if (self.pushFile(os.path.join(root, file), remoteName) == None):
Push file is not going to return None.  It's going to return False ^^^
>+          #NOTE: we do not check the return value here.  This is a sanity check and the file could not exist
Is this referring to the removeFile below?  At this point, the push has failed the first time, and I don't think we care if the remove does or doesn't succeed.  We are just removing in case there was a partial push and we're going to try one more time.
>           self.removeFile(remoteName)
>           if (self.pushFile(os.path.join(root, file), remoteName) == None):
>             return None

>+  # external function
>+  # returns:
>+  #  success: pid
>+  #  failure: None
>   def fireProcess(self, appname):
>     if (self.debug >= 2): print "FIRE PROC: '" + appname + "'"
>     
>     if (self.processExist(appname) != ''):
>       print "WARNING: process %s appears to be running already\n" % appname
>     
>-    self.sendCMD(['exec ' + appname])
>+    try:
>+      data = self.verifySendCMD(['exec ' + appname])
>+    except(DMError):
>+      return None
> 
>-    #NOTE: we sleep for 30 seconds to allow the application to startup
>-    time.sleep(30)
>+    #NOTE: we sleep to allow the application to startup (it could take >3 seconds, but end in <15)
>+    timeslept = 0
>+    while (timeslept <= 30):
>+      self.process = self.processExist(appname)
>+      if (self.process is not ''):
>+        break
>+      time.sleep(3)
>+      timeslept += 3
I'm glad you've cleaned this up.  Much nicer.
> 
>-    self.process = self.processExist(appname)
>     if (self.debug >= 4): print "got pid: " + str(self.process) + " for process: " + str(appname)
>+    return self.process
If the process did not end up starting then self.process is '', not None as you claim you will return in your comment.  It looks like you intend for it to be None if self.ProcessExist finds that the process does not exist.  That change will require changes to a lot of upstream code, but I tend to think that since we're here doing this we should go ahead and fix this now.
We need to change the above if statement to look for None instead of '', and we need to change processExist further down in the code and file a follow up bug to change the rest of the scripts that depend on this API. 
We've already shot ourselves in the foot with this one a couple of times debugging a return of '' vs. None, so I think it'd be better to bite the bullet and change it now.
 
>+  # external function
>+  # returns:
>+  #  success: output filename
>+  #  failure: None
>   def launchProcess(self, cmd, outputFile = "process.txt", cwd = '', env = ''):
>-    self.fireProcess(cmdline)
>+    if self.fireProcess(cmdline) is None:
>+      return None
This will fail to capture a failure unless we change processExist, so more reason to do it now.

>+  #TODO: this might not be used
>   #hardcoded: sleep interval of 5 seconds, timeout of 10 minutes
>+  # external function
>+  # returns:
>+  #  success: [file contents, None]
>+  #  failure: [None, None]
I think this is used by some of the RProcess stuff in remote-automation.py
>   def communicate(self, process, timeout = 600):
>     interval = 5
>     timed_out = True
>     if (timeout > 0):
>       total_time = 0
>       while total_time < timeout:
>         time.sleep(interval)
>-        if (not self.poll(process)):
>+        if self.poll(process):
>           timed_out = False
>           break
This looks wrong.  Communicate should poll the process and only return when the process stops running or if the operation times out.  If the process stops running then timed_Out should be false and we should break out of the loop.  Here, if the process is running, you'll immediately break out of the loop, which is not what we want at all.

>         total_time += interval
> 
>     if (timed_out == True):
>-      return None
>+      return [None, None]
> 
>     return [self.getFile(process, "temp.txt"), None]
> 
> 
>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: None #None for legacy reasons
>   def poll(self, process):
>     try:
>       if (self.processExist(process) == ''):
>         return None
>-      return 1
>+      return True
>     except:
>       return None
>-    return 1
>+    return True
Why exactly does this function exist?  It's basically just calling processExist and chnging its return value to a True/None return value.
Let's just fix processExist and remove this function?

>   
>   # iterates process list and returns pid if exists, otherwise ''
>+  # external function
>+  # returns:
>+  #  success: pid
>+  #  failure: ''  #TODO: make this a failure of 'None' to match everything else
Yes, let's change that return value now and file a bug to change upstream consumers. Also change the above comment that explains the functionality so that it matches

>+  #
>+  # external function
>+  # returns:
>+  #  success: path for device root
>+  #  failure: None
>   def getDeviceRoot(self):
>     # This caching of deviceRoot is causing issues if things fail
>     # if (not self.deviceRoot):
Let's take that ^ line out.  We don't use self.deviceRoot anywhere else in the devicemanager and all our consumers call this function when they want it.  So let's just stop caching as an attribute on the class.
>-    data = self.sendCMD(['testroot'])
>-    if (data == None):
>-      return '/tests'
>+    try:
>+      data = self.verifySendCMD(['testroot'])
>+    except:
>+      return None
>+  
>     self.deviceRoot = self.stripPrompt(data).strip('\n') + '/tests'
I don't think there's any need to cache this on the class anymore ^ just do deviceRoot = <blah>
> 
>     if (not self.dirExists(self.deviceRoot)):
>-      self.mkDir(self.deviceRoot)
>+      if (self.mkDir(self.deviceRoot) == None):
>+        return None
> 
>     return self.deviceRoot
> 
>   # Either we will have /tests/fennec or /tests/firefox but we will never have
>   # both.  Return the one that exists
>+  # TODO: ensure we can support org.mozilla.firefox
See below
>+  # external function
>+  # returns:
>+  #  success: path for app root
>+  #  failure: None
>   def getAppRoot(self):
>-    if (self.dirExists(self.getDeviceRoot() + '/fennec')):
>-      return self.getDeviceRoot() + '/fennec'
>-    elif (self.dirExists(self.getDeviceRoot() + '/firefox')):
>-      return self.getDeviceRoot() + '/firefox'
>+    devroot = self.getDeviceRoot()
>+    if (devroot == None):
>+      return None
>+
>+    if (self.dirExists(devroot + '/fennec')):
>+      return devroot + '/fennec'
>+    elif (self.dirExists(devroot + '/firefox')):
>+      return devroot + '/firefox'
>     else:
>       return 'org.mozilla.fennec'
Replace the else with: 
elif (self.dirExsts('/data/data/org.mozilla.fennec')):
  return 'org.mozilla.fennec'
elif (self.dirExists('/data/data/org.mozilla.firefox')):
  return 'org.mozilla.firefox'
else:
  # Failure (either not installed or not a recognized platform)
  return None
Then we properly support Android.
  
>+  # external function
>+  # returns:
>+  #  success: status from test agent
>+  #  failure: None
>   def reboot(self, ipAddr=None, port=30000):
>     cmd = 'rebt'
> 
>     if (self.debug > 3): print "INFO: sending rebt command"
>     if (ipAddr is not None):
>       ip, port = self.getCallbackIpAndPort(ipAddr, port)
> 
>       # Set up our callback server
>       callbacksvr = callbackServer(ip, port, self.debug)
>-      data = self.sendCMD([cmd])
>+      try:
>+        data = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
>       status = callbacksvr.disconnect()
>     else:
>-      status = self.sendCMD([cmd])
>+      try:
>+        status = self.sendCMD([cmd])
>+      except(DMError):
>+        return None
>+
>     if (self.debug > 3): print "INFO: rebt- got status back: " + str(status)
>-
>-    return ''
>+    return status
The reboot function looks way complex.  Also, the case where we do not have an IP callback is using sendCMD not verifySendCMD
Why can't we:
if (ipAddr is not None):
  ip, port = self.getCallbackIpAndPort(ipAddr, port)
  callbacksvr = callbackServer(ip, port, self.debug)
try:
  status = self.verifySendCMD([cmd])
except(DMError):
  return None

if (ipAddr is not None):
  return(callbacksvr.disconnect())
else:
  return status

> 
>   # Returns information about the device:
>   # Directive indicates the information you want to get, your choices are:
>   # os - name of the os
>   # id - unique id of the device
>   # uptime - uptime of the device
>   # systime - system time of the device
>   # screen - screen resolution
>   # memory - memory stats
>   # process - list of running processes (same as ps)
>   # disk - total, free, available bytes on disk
>   # power - power status (charge, battery temp)
>   # all - all of them - or call it with no parameters to get all the information
>+  # TODO: figure out the return values for this function
Returns a dict of lists, i.e. {'os': ['os info 1', 'os info 2',...],'uptime':['uptime info1', uptime info2',...]}
For failure it should just return an empty dict this way it matches the  behavior of other functions that return lists
>   def getInfo(self, directive=None):
>     data = None
>     result = {}
>     collapseSpaces = re.compile('  +')
> 
>     directives = ['os', 'id','uptime','systime','screen','memory','process',
>                   'disk','power']
>     if (directive in directives):
>@@ -819,42 +1065,55 @@ class DeviceManager:
>   """
>   Installs the application onto the device
>   Application bundle - path to the application bundle on the device
>   Destination - destination directory of where application should be
>                 installed to (optional)
>   Returns None for success, or output if known failure
>   TODO: we need a better way to know if this works or not
I think we can remove that ^ TODO: we have the better way to know if it works, we can do that by performing the update with the callbacksvr.
>   """
>   Uninstalls the named application from device and causes a reboot.
>   Takes an optional argument of installation path - the path to where the application
>   was installed.
>   Returns True, but it doesn't mean anything other than the command was sent,
>   the reboot happens and we don't know if this succeeds or not.
>   """
>+  # external function
>+  # returns:
>+  #  success: True
>+  #  failure: False (TODO: consider changing this to not be true/false)
Why not go ahead and make this match the return behavior of installApp, so that it should return True on success and None on failure.

>   """
>   Updates the application on the device.
>   Application bundle - path to the application bundle on the device
>   Process name of application - used to end the process if the applicaiton is
>                                 currently running
>@@ -865,52 +1124,71 @@ class DeviceManager:
>   port - port to await a callback ping to let us know that the device has updated properly
>          defaults to 30000, and counts up from there if it finds a conflict
>   Returns True if succeeds, False if not
>   
>   NOTE: We have no real way to know if the device gets updated or not due to the
>         reboot that the udpate call forces on us.  We can't install our own heartbeat
>         listener here because we run the risk of racing with other heartbeat listeners.
We should remove this note now that we have a callback ability.
>   """
>+  # external function
>+  # returns:
>+  #  success: text status from command or callback server
>+  #  failure: None
>   def updateApp(self, appBundlePath, processName=None, destPath=None, ipAddr=None, port=30000):
>     status = None
>     if (ipAddr is not None):
>       ip, port = self.getCallbackIpAndPort(ipAddr, port)
> 
>       cmd += " %s %s" % (ip, port)
Interesting, does the reboot cmd use a different format if it is using the callback server?  We should check on that because in this patch the reboot command uses the same command whether or not a callback server is in place.
> 
>       # Set up our callback server
>       callbacksvr = callbackServer(ip, port, self.debug)
>-      data = self.sendCMD([cmd])
>+
>+      try:
>+        data = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
>+
>       status = callbacksvr.disconnect()
>     else:
>-      status = self.sendCMD([cmd])
>+      try:
>+        status = self.verifySendCMD([cmd])
>+      except(DMError):
>+        return None
> 
>-    if (self.debug > 3): print "got status back: " + str(status)
>+    if (self.debug > 3): print "INFO: updateApp: got status back: " + str(status)
>+
>     return status
I feel like this function could have the same structure as the reboot method except that it changes the command run in the ipAddr case (and probably reboot should as well).  It just looks crazy to have such a small function with four different returns.

Attachment #502866 - Flags: review?(ctalbert) → review-

Mark Côté [:mcote]

Comment 16

•

14 years ago

Attached patch remote-testing devicemanager error handling (1.5) — Details — Splinter Review

As discussed, this patch fixes the stuff you commented on, as well as a few more fixes and improvements.

Attachment #502866 - Attachment is obsolete: true

Attachment #503218 - Flags: review?(ctalbert)

cmtalbert

Comment 17

•

14 years ago

Comment on attachment 503218 [details] [diff] [review]
remote-testing devicemanager error handling (1.5)

Looks great, as discussed on IRC let's change the mkDir and getInfo to verifySendCmd and then we'll be all done and ready to go.

Thanks for all the hard work!

Attachment #503218 - Flags: review?(ctalbert) → review+

Mark Côté [:mcote]

Comment 18

•

14 years ago

Pushed to remote-testing as http://hg.mozilla.org/automation/remote-testing/rev/7c5343f57dc8

cmtalbert

Comment 19

•

14 years ago

Attached patch Patch for mc — Details — Splinter Review

This is the patch to update mozilla-central with the new devicemanager and also updates the affected uses of the changed API calls.

Attachment #503267 - Flags: review?(mcote)

cmtalbert

Comment 20

•

14 years ago

Attached patch Patch for Talos (obsolete) — Details — Splinter Review

Patch for Talos that updates devicemanager and the APIS that Talos was using that have now changed.

Attachment #503268 - Flags: review?(jmaher)

cmtalbert

Comment 21

•

14 years ago

Attached patch Update Tegra Tools DeviceManager APIs — Details — Splinter Review

We realized during this code review that the devicemanager had parameters for ipaddress and port for a callback server on the devicemanager.reboot API.  However, the agent was not using these and the devicemanager sent the same command to the agent even if these parameters were specified.  To reduce confusion these parameters have been removed, which was the only API change that affected the tegra tools.  Patch created from Aki's user repo.

Attachment #503270 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

14 years ago

Attachment #503270 - Flags: review?(bear) → review+

Mark Côté [:mcote]

Comment 22

•

14 years ago

Comment on attachment 503267 [details] [diff] [review]
Patch for mc

 
-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
             raise devicemanager.FileError("Failed to copy profiledir to device")
 
     def copyExtraFilesToProfile(self, options, profileDir):
         RefTest.copyExtraFilesToProfile(self, options, profileDir)
-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
             raise devicemanager.FileError("Failed to copy extra files to device") 

Hmm for better or for worse, pushDir() still returns None on failure.

         self.localProfile = options.profilePath
-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:

Ditto.

-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:

And again.

Aside from that it's all good.  I verified that, after applying the patch, devicemanager.py in build/mobile is that same as that in remote-testing/tests.

Attachment #503267 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 23

•

14 years ago

Attached patch patch for talos - devicemanager error handling and cleanup (obsolete) — Details — Splinter Review

updated devicemanager to match all work done on mozilla-central and remote-testing.  Updated talos code to handle errors better.  

FYI: I haven't tested this patch as I don't have a setup on a remote device.  I tested an earlier version of this and there are a few changes (pushFile return value, removed poll and using processExist), but overall I feel comfortable that this patch is going to be ready for prime time.

Attachment #503268 - Attachment is obsolete: true

Attachment #503374 - Flags: review?(anodelman)

Attachment #503374 - Flags: feedback?(ctalbert)

Attachment #503268 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 24

•

14 years ago

Comment on attachment 503267 [details] [diff] [review]
Patch for mc

>diff --git a/layout/tools/reftest/remotereftest.py b/layout/tools/reftest/remotereftest.py
>-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
>+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
>             raise devicemanager.FileError("Failed to copy profiledir to device")

pushdir still returns None

> 
>     def copyExtraFilesToProfile(self, options, profileDir):
>         RefTest.copyExtraFilesToProfile(self, options, profileDir)
>-        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == None):
>+        if (self._devicemanager.pushDir(profileDir, options.remoteProfile) == False):
>             raise devicemanager.FileError("Failed to copy extra files to device") 
> 

ditto


>diff --git a/testing/mochitest/runtestsremote.py b/testing/mochitest/runtestsremote.py
>     def buildProfile(self, options):
>         manifest = Mochitest.buildProfile(self, options)
>         self.localProfile = options.profilePath
>-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
>+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:
>             raise devicemanager.FileError("Unable to copy profile to device.")

ditto

>         #we really need testConfig.js (for browser chrome)
>-        if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
>+        if self._dm.pushDir(options.profilePath, self.remoteProfile) == False:
>             raise devicemanager.FileError("Unable to copy profile to device.")

ditto

Attachment #503267 - Flags: feedback-

cmtalbert

Comment 25

•

14 years ago

Comment on attachment 503374 [details] [diff] [review]
patch for talos - devicemanager error handling and cleanup

The error handling looks good to me.  The only question I have is: if all callers to TerminateAllProcesses do not expect a return code, why did you add one?  And when you add one, you added a return -1 for the failure case, when it seems that everything else in Talos land returns 1 on failure and 0 on success.  

You now have TerminateAllProcess returning -1 on failure and None on success.  I'd at least make this match the other Talos APIs or not have it return anything at all rather than creating a third paradigm.

I'll defer to Alice for the final decision there, but that's my two cents.  Otherwise, the patch looks great.

Attachment #503374 - Flags: feedback?(ctalbert) → feedback+

cmtalbert

Comment 26

•

14 years ago

Landed on m-c: http://hg.mozilla.org/mozilla-central/rev/1511d70f1e42
Landed on r-t: http://hg.mozilla.org/automation/remote-testing/rev/7c5343f57dc8
Landed on tegra-tools: http://hg.mozilla.org/users/asasaki_mozilla.com/tegra-tools/rev/a1dbeff89855

Holding off on marking as "FIXED" until it lands in Talos.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 27

•

14 years ago

for the return code from TerminateAllProcess, I added it so it is there if we need it.  I agree on the -1 vs 1 stuff, that should at least match what most other talos APIs do.  

I won't be able to run this in staging until next week, but I will do that first thing to get this ball rolling.

cmtalbert

Comment 28

•

13 years ago

Attached patch fixing a mistake (obsolete) — Details — Splinter Review

found an error on m-c.  it's also in the talos version.  Patch upcoming for talos too.

Attachment #503853 - Flags: review?(mcote)

cmtalbert

Comment 29

•

13 years ago

Attached patch with 100% more content, sorry bout that — Details — Splinter Review

Attachment #503853 - Attachment is obsolete: true

Attachment #503868 - Flags: review?(mcote)

Attachment #503853 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 30

•

13 years ago

Comment on attachment 503868 [details] [diff] [review]
with 100% more content, sorry bout that

Great, good catch.

Attachment #503868 - Flags: review?(mcote) → review+

cmtalbert

Comment 31

•

13 years ago

Attached patch fixing the same devicemanager typo on talos — Details — Splinter Review

This should be applied atop attachment 503374 [details] [diff] [review], it fixes the same devicemanager typo for the talos tree.

Attachment #503955 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 32

•

13 years ago

Comment on attachment 503955 [details] [diff] [review]
fixing the same devicemanager typo on talos

Yup yup!

Attachment #503955 - Flags: review?(mcote) → review+

alice nodelman [:alice] [:anode]

Comment 33

•

13 years ago

I don't know if I'm the right person to handle devicemanager patches... I'm not very familiar with the code.  Might be more appropriate for bear/bmoss?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 34

•

13 years ago

Alice, I intended for you to review the changes to the non devicemanager.py code.  Overall the devicemanager.py has had 3 people reviewing it over and over again.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 35

•

13 years ago

I just ran this through talos staging and it is green.  Just need a sanity check review on the core talos files and we can get this landed.

alice nodelman [:alice] [:anode]

Comment 36

•

13 years ago

I'm a little confused by this chunk:

     def TerminateAllProcesses(self, *process_names):
         """Helper function to terminate all processes with the given process name
 
         Args:
           process_name: String or strings containing the process name, i.e. "firefox"
         """
         for process_name in process_names:
             try:
-                self.testAgent.killProcess(process_name)
+                if (self.testAgent.killProcess(process_name) is None):
+                    # All code which call TerminateAllProcesses does not check for a return code
+                    return -1
             except:
                 # Might get an exception if there are no instances of the process running.
                 continue

Does this mean that we stop attempting to close processes upon finding a failed (None) killProcess?  Shouldn't we continue to close processes just for cleanliness?  Either that, or the comment is unclear as to what is happening.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 37

•

13 years ago

Attached patch patch for talos - devicemanager error handling and cleanup (1.1) — Details — Splinter Review

updated patch to remove the early termination in TerminateAllProcesses inside ffprocess_remote.py.  The first patch went a bit overboard on error handling and that was unnecessary and doing the wrong thing.

Attachment #503374 - Attachment is obsolete: true

Attachment #504856 - Flags: review?(anodelman)

Attachment #503374 - Flags: review?(anodelman)

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Attachment #504856 - Flags: review?(anodelman) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 38

•

13 years ago

passed on staging, landed on talos:
http://hg.mozilla.org/build/talos/pushloghtml?changeset=479f9cff5e8e

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 39

•

13 years ago

Attached patch mozilla-central patch for one line change that came from talos (1.0) — Details — Splinter Review

simple fix to get in sync with talos.

Attachment #505219 - Flags: review?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 40

•

13 years ago

Attached patch remote-testing devicemanager one line sync patch (1.0) — Details — Splinter Review

one line fix to remote-testing repository for self.process->process.

Attachment #505221 - Flags: review?(ctalbert)

cmtalbert

Updated

•

13 years ago

Attachment #505219 - Flags: review?(ctalbert) → review+

cmtalbert

Comment 42

•

13 years ago

Comment on attachment 505221 [details] [diff] [review]
remote-testing devicemanager one line sync patch (1.0)

Looks like you have some changes here to the python agent in your diff as well.  I think those changes look fine (they all add functionality, not decrease functionality) so I'll r+ those too.

Attachment #505221 - Flags: review?(ctalbert) → review+

Mike Taylor [:bear]

Comment 43

•

13 years ago

calls to removeDir() are getting a return of None but the output looks like it's working - see http://pastebin.mozilla.org/965728 for log output details

Mark Côté [:mcote]

Comment 44

•

13 years ago

One-line fix in patch 503868 ("with 100% more content, sorry bout that") pushed to remote-testing as http://hg.mozilla.org/automation/remote-testing/rev/2be667ebf48e

Mark Côté [:mcote]

Comment 45

•

13 years ago

Attached patch Fixed error handling in isDir() — Details — Splinter Review

Just found this one.  DeviceManager.isDir() isn't handling AGENT-WARNINGs correctly.  I wasn't able to reproduce such a warning (it isn't generated normally when there is no file or directory), but I saw it earlier when testing with my Fennec profile tool.  I also included an isDir unit test for completeness, though unfortunately it doesn't reproduce this particular bug.

This is a patch against remote-testing, btw.

Attachment #508578 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 46

•

13 years ago

Comment on attachment 508578 [details] [diff] [review]
Fixed error handling in isDir()

good catch here.  This appears solve one of the problems that :bear has seen on and off with removeDir.

For the unit test, I would like to see something like this added:
1) create dir
2) create dir/file
3) verify dir
4) verify dir/file
5) delete dir
6) verify dir == false
7) verify dir/file == false

r=me with consideration to additional unit tests

Attachment #508578 - Flags: review?(jmaher) → review+

Mark Côté [:mcote]

Comment 47

•

13 years ago

Okay the test already does most of that; I just added 5, 6, & 7.  Though step 4 should be verify dir/file == false, since isDir() should return False for files.

Pushed as http://hg.mozilla.org/automation/remote-testing/rev/cb81f84f53c4

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Blocks: 630647

Mark Côté [:mcote]

Comment 48

•

13 years ago

Attached patch Fixed getDirectory() with empty dirs (obsolete) — Details — Splinter Review

Here's another problem: getDirectory() aborts if it finds an empty directory.  It shouldn't.

Includes unit test (but no batteries).

Attachment #508870 - Flags: review?(jmaher)

Mark Côté [:mcote]

Comment 49

•

13 years ago

Attached patch Fixed getDirectory() with empty dirs — Details — Splinter Review

This preserves the old behaviour in which None is returned if the directory doesn't exist but still creates the directory if it exists but is empty.  Added an optional parameter so that the recursive call doesn't call isDir() twice.

Attachment #508870 - Attachment is obsolete: true

Attachment #508890 - Flags: review?(jmaher)

Attachment #508870 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 50

•

13 years ago

Comment on attachment 508890 [details] [diff] [review]
Fixed getDirectory() with empty dirs

this looks good, thanks mcote

Attachment #508890 - Flags: review?(jmaher) → review+

Mark Côté [:mcote]

Comment 51

•

13 years ago

Pushed as http://hg.mozilla.org/automation/remote-testing/rev/a6d9166bb4de

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 52

•

13 years ago

Attached patch add a wait in reboot until device comes back up (1.0) (obsolete) — Details — Splinter Review

another patch as a request from releng to have reboot wait until the device comes online before returning.

Attachment #508901 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 53

•

13 years ago

Comment on attachment 508901 [details] [diff] [review]
add a wait in reboot until device comes back up (1.0)

Looks good, though you could tighten up that loop like so:

    if (wait == True):
      #this sleeps up to 5 minutes in 30 second intervals
      count = 0
      while (count < 10):
        if (self.debug >= 4): print "DEBUG: sleeping 30 seconds while waiting for reboot"
        time.sleep(30)
        waitstatus = self.getDeviceRoot()
        if (waitstatus is not None):
          break
        self.retries = 0
        count += 1

That gets rid of the duplicate code and should work the same way, since you are doing the sleep right away.

Is it not an error for it to time out?  We don't need to raise an exception if count == 10?

r+ though I would like to hear the answer to my last question.

Attachment #508901 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 54

•

13 years ago

Attached patch add a wait in reboot until device comes back up (2.0) — Details — Splinter Review

asking for review since I have unit tests to add here.  To answer the questions from previous review, I have simplified the code as mentioned.  As for the error condition being an exception vs a return code, it is preferable to work with existing scripts to be a return value, not an exception.

Attachment #508901 - Attachment is obsolete: true

Attachment #509150 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 55

•

13 years ago

Comment on attachment 509150 [details] [diff] [review]
add a wait in reboot until device comes back up (2.0)

The modifications to the reboot() command look good.  Just a couple comments about the unit test:

+        start = datetime.datetime.now()
+        info = self.dm.getInfo('uptime')
+        end = datetime.datetime.now()

On my system the first half of the unit test failed because, at some point during the reboot cycle, the reconnect attempts failed immediately, so it ran out of retries before the device came back up.  I put the getInfo() call in a loop, and it worked.  Here's what I did:

        start = datetime.datetime.now()
        count = 0
        while count < 5:
            try:
                info = self.dm.getInfo('uptime')
            except devicemanager.DMError:
                time.sleep(10)
                count += 1
            else:
                break
        self.assertTrue(count < 5)
        end = datetime.datetime.now()

This should also work for your case, in which the getInfo() command hangs until the device is back up.  Note that you'll also need an "import devicemanager" at the top of the file.  Please try this out on your device.

+        if (not matches):
+            print "error getting uptime information"

Might as well replace these two lines with

        self.assertTrue(matches, "error getting uptime information")

since, if matches == None, the test will throw an exception immediately after this anyway when it starts trying to access matches.group().

So yeah, try the unit test with my modification on your system, and if it works, we're good here.

Attachment #509150 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 56

•

13 years ago

reboot fix and unittests landed on remote-testing:
http://hg.mozilla.org/automation/remote-testing/pushloghtml?changeset=7f84892f72b1

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 57

•

13 years ago

Attached patch update talos with reboot, getDirectory and isfile changes (1.0) — Details — Splinter Review

all of these changes are on remote-testing, porting to talos.

Attachment #509473 - Flags: review?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Attachment #509473 - Flags: review?(ctalbert) → review?(bear)

Mike Taylor [:bear]

Updated

•

13 years ago

Attachment #509473 - Flags: review?(bear) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 58

•

13 years ago

pushed to talos:
http://hg.mozilla.org/build/talos/pushloghtml?changeset=f8eaf58fc0e3

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 59

•

13 years ago

Attached patch mozilla-central update for reboot, getDirectory and isdir (1.0) — Details — Splinter Review

update mozilla-central with remote-testing and talos changes to devicemanager.py

Attachment #509494 - Flags: review?(mcote)

Mark Côté [:mcote]

Comment 60

•

13 years ago

Comment on attachment 509494 [details] [diff] [review]
mozilla-central update for reboot, getDirectory and isdir (1.0)

Looks good, and after applying this patch devicemanager.py is the same in all 3 locations.  Yay!

Attachment #509494 - Flags: review?(mcote) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 61

•

13 years ago

http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=5a4ab9b3fd2f

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: New Frameworks → General

You need to log in before you can comment on or make changes to this bug.