Closed Bug 795456 Opened 12 years ago Closed 12 years ago

mozdevice should always use exceptions, not return codes to communicate errors

Categories

(Testing :: Mozbase, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

References

Details

Attachments

(1 file, 4 obsolete files)

We've already gone some of the way to doing this, and it's time to finish the job. In general, return codes have proven to be a bad way to communicate problems. It's hard to distinguish them from a successful command and when things do go wrong it's not entirely clear what happened. A good example with this would be bug 772531: we added a specific log message to indicate reboots because it was not clear from the logs what was happening, other than a failure. Moreover, it makes downstream code much harder to write. It's much, much easier to wrap a sequence of devicemanager calls in a try...except block than to painstakingly check the results of each command for errors. Yes, this is a somewhat disruptive change, but it will pay dividends. I think the best time to make the change is now, when we're making other large scale modifications to mozdevice anyway (e.g. bug 793236). That way consumers of the API can change everything at once. I'll try to update at least mozilla-central and talos to account for any changes here.
Have a preliminary patch, not quite ready for review but I want to run it through try over the weekend: https://tbpl.mozilla.org/?tree=Try&rev=662e58793a80
Try run for d7e786fdc5b2 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=d7e786fdc5b2 Results (out of 20 total builds): success: 2 warnings: 18 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/wlachance@mozilla.com-d7e786fdc5b2
What the bot failed to mention is that it doesn't have a clue about RETRY, so it doesn't realize that if I don't kill it, your robocop will literally run over the weekend, since in http://hg.mozilla.org/build/buildbotcustom/annotate/9966755da4cb/status/errors.py#l7 we automatically retry on devicemanager.DMError in the log, and it's producing one.
Ok, so that try run didn't succeed very well, but after a bit of hacking I got things into a state where it did: There are two oranges due to the fact that pullFile doesn't currently retry if it fails. This problem was being masked by the fact that we used to ignore pullFile errors. I implemented what amounts to the same effect by explicitly catch DMErrors here and ignoring them. I'd like to do a few more try runs to make sure this won't break anything, but I think I'm ready to get some review on what I have so far.
(In reply to William Lachance (:wlach) from comment #4) > Ok, so that try run didn't succeed very well, but after a bit of hacking I > got things into a state where it did: Oops, forgot link: https://tbpl.mozilla.org/?tree=Try&rev=b9f4baccf14f
There's a lot in this patch: see the documentation at the top of the patch file for more info. This is probably best viewed in conjunction with my corresponding patch to m-c. Asking :jmaher and :edmorley for feedback to make sure there's nothing here that will break stuff madly.
Attachment #666679 - Flags: review?(ahalberstadt)
Attachment #666679 - Flags: feedback?(jmaher)
Attachment #666679 - Flags: feedback?(bmo)
Going to ask jmaher to review this one as he's most familiar with this code. Getting feedback from ahal (for B2G stuff and general) and edmorley (to make sure this will play nicely with tbpl)
Attachment #666685 - Flags: review?(jmaher)
Attachment #666685 - Flags: feedback?(bmo)
Attachment #666685 - Flags: feedback?(ahalberstadt)
Comment on attachment 666685 [details] [diff] [review] Corresponding changes to remote automation Review of attachment 666685 [details] [diff] [review]: ----------------------------------------------------------------- 1) we are making changes to b2gautomation.py which remove shell and switch to _checkCmd. That should be in a different patch. 2) we currently have specific error messages to indicate where we failed in the automation, the changes in this patch just bubble up a pushfile failure, which doesn't help. ::: build/mobile/remoteautomation.py @@ +157,4 @@ > > @property > def pid(self): > + return self.dm.processExist(self.procName) I really don't like this. We have scenarios where fennec doesn't start up for a long time (or at least in the past) which mean we need to return 0 instead of throw an exception. ::: testing/mochitest/runtestsremote.py @@ -308,5 @@ > return "NO_CHROME_ON_DROID" > path = '/'.join(parts[:-1]) > manifest = path + "/chrome/" + os.path.basename(filename) > - if self._dm.pushFile(filename, manifest) == False: > - raise devicemanager.FileError("Unable to install Chrome files on device.") my biggest concern here is we are going to be outputting a different error message by just letting the exception from pushFile bubble up.
Attachment #666685 - Flags: review?(jmaher) → review-
Comment on attachment 666679 [details] [diff] [review] Patch to mozdevice to make it return exceptions Review of attachment 666679 [details] [diff] [review]: ----------------------------------------------------------------- a lot of functionality changes that I didn't expect as well as all the work that edmorley just landed in the last couple weeks to get the tbpl parsers to work with our error messages. ::: mozdevice/mozdevice/devicemanagerADB.py @@ +181,4 @@ > > def mkDir(self, name): > """ > Creates a single directory on the device file system unrelated...I always assumed that mkDir would create all directories like mkdir -p does @@ +237,2 @@ > except: > print "pushing " + localDir + " to " + remoteDir + " failed" do we need to throw an exception here? @@ +289,2 @@ > """ > p = self._runCmd(["shell", "ls", "-a", remotePath + '/']) hmm, isDir as a name indicates true/false return, but we have removed the comment indicating what this returns. @@ -527,5 @@ > - localFile = self._runPull(remoteFile, localFile) > - > - if localFile is None: > - print 'Automation Error: failed to pull file %s!' % remoteFile > - return None the behavior seems different here since we are getting localFile as a return from _runPull and doing an open() on it below, now we are not doing it? @@ +471,2 @@ > """ > + contents = self.pullFile(remoteFile) what do we do if contents in None? @@ -554,5 @@ > > fhandle = open(localFile, 'wb') > fhandle.write(contents) > fhandle.close() > - return contents why are we not returning this anymore? @@ -586,5 @@ > - line = p.stdout.readline() > - #the last line is a summary > - if (len(ret) > 0): > - ret.pop() > - return ret why are we not returning a list of files here anymore? @@ +618,3 @@ > """ > timestr = self._runCmd(["shell", "date", "+%s"]).stdout.read().strip() > if (not timestr or not timestr.isdigit()): where do we raise an exception here? @@ -819,5 @@ > - data = self._runCmd(["uninstall", appName]).stdout.read().strip() > - status = data.split('\n')[0].strip() > - if status == 'Success': > - return > - raise DMError("uninstall failed for %s" % appName) how will we know this failed? @@ -921,5 @@ > Recursively changes file permissions in a directory > - > - returns: > - success: True > - failure: False where do we remove the return False? where do we throw an error? ::: mozdevice/mozdevice/devicemanagerSUT.py @@ -371,5 @@ > - destname = destname + os.path.basename(localname) > - if (self.validateFile(destname, localname) == True): > - if (self.debug >= 3): > - print "files are validated" > - return True we are removing the code that checks the hash before pushing the file. This is no big deal, but we are sneaking in changes that are unrelated. @@ -375,5 @@ > - return True > - > - if self.mkDirs(destname) == None: > - print "Automation Error: unable to make dirs: " + destname > - return False we are removing the code that makes the root directory structure here, that changes the function. @@ +355,5 @@ > + localHash = self._getLocalHash(localname) > + > + if localHash != remoteHash: > + raise DMError("Push File failed to Validate! (localhash: %s, " > + "remotehash: %s)" % (localHash, remoteHash)) please keep the 'Automation Error' string in here for tbpl. @@ +405,2 @@ > """ > + return self.isDir(dirname) why not just rename isDir to dirExists and get rid of the redundancy? Likewise we are removing the documentation for what we are returning here. @@ -495,5 @@ > - the device file system > - > - returns: > - success: True > - failure: False we are removing the documentation of what we are returning here. @@ -502,5 @@ > containingpath = '/'.join(s[:-1]) > - listfiles = self.listFiles(containingpath) > - for f in listfiles: > - if (f == s[-1]): > - return True before we checked to see if the file was displayed in the returning results, now we just return the results? Also we are changing this from a true/false to returning a string. @@ +447,5 @@ > try: > + if self.fileExists(filename): > + self._runCmds([{ 'cmd': 'rm ' + filename }]) > + except DMError, e: > + raise DMError("Error removing file from device: %s" % e.msg) Can we add the 'Automation Error' into the error string? @@ -624,3 @@ > > - returns: > - success: output filename we do not document what is returned from this function. @@ +590,4 @@ > # However it means we can't use the response-handling logic in sendCMD(). > > def err(error_msg): > + err_str = 'Error pulling file: %s' % error_msg why did we remove the DeviceManager from the err_str? @@ +671,4 @@ > # prompt should follow > read_exact(len(prompt), buf, 'could not find prompt') > # failures are expected, so don't use "Remote Device Error" or we'll RETRY > + raise DMError("pulling file '%s' unsuccessful: %s" % (remoteFile, error_str)) all DMError() calls should have DeviceManager in the front so tbpl can match the error. @@ +696,3 @@ > > fhandle = open(localFile, 'wb') > + fhandle.write(data) what if data is None? we will generate a TypeError and it will be confusing to debug. @@ +707,4 @@ > """ > if (self.debug >= 2): > print "getting files in '" + remoteDir + "'" > + if checkDir and not self.isDir(remoteDir): isDir returns false? @@ -893,5 @@ > Checks if remotePath is a directory on the device > - > - returns: > - success: True > - failure: False please retain some form of documentation on the return values here. @@ -914,5 @@ > Checks if the remoteFile has the same md5 hash as the localFile > - > - returns: > - success: True > - failure: False why are we are losing these return values? @@ +802,3 @@ > > + if not self.dirExists(self.deviceRoot): > + print "MKDIR" extra debugging left in? @@ -1104,3 @@ > > - if (data is None): > - continue we are removing this check which could cause other errors, is there a specific reason? @@ +942,5 @@ > > try: > data = self._runCmds([{ 'cmd': cmd }]) > + except DMError, e: > + raise DMError("Error installing app: %s" % e.msg) we lost the 'remote device error' string in the error message. @@ +962,5 @@ > cmd += ' ' + installPath > try: > data = self._runCmds([{ 'cmd': cmd }]) > + except DMError, e: > + raise DMError("Error uninstalling %s: %s" % (appName, e.msg)) again, we lost the remote device error
Attachment #666679 - Flags: feedback?(jmaher) → feedback-
Comment on attachment 666679 [details] [diff] [review] Patch to mozdevice to make it return exceptions Review of attachment 666679 [details] [diff] [review]: ----------------------------------------------------------------- I'm r-'ing because I think we should try and preserve the original traceback whenever we can. I don't know if there's a way to pass it along to a new raise call, but at the very least I think we should be printing it at a certain debug level. I'm also not a fan of the except DMError: raise DMError pattern. Though, if you really want the new message I'd be fine with just printing the original traceback. There's also a few other misc nits. Anyway, this patch is really awesome, can't wait to get it landed. Thanks! ::: mozdevice/mozdevice/devicemanagerADB.py @@ +177,4 @@ > destname = destname + "/" + os.path.basename(localname) > return True > except: > + raise DMError("Error pushing file to device") So I think we should try to preserve the original traceback wherever possible. I'm not sure how to do it with custom exceptions otoh, but maybe we could at least print the original traceback to a certain log level: import traceback if self.debug >= 3: traceback.print_exc() Or letting the exception propagate would be easier (assuming it doesn't break the buildbot code). @@ +191,2 @@ > except: > + raise DMError("Error creating directory") The first exception's message will be lost. Also preserving traceback again would be helpful. @@ +449,4 @@ > self._checkCmdAs(["shell", "rm", remoteTmpFile]) > return localFile > except (OSError, ValueError): > + raise DMError("Error pulling remote file '%s' to '%s'" % (remoteFile, localFile)) traceback @@ +607,5 @@ > + destPath - Destination directory to where the application should be installed (optional) > + ipAddr - IP address to await a callback ping to let us know that the device has updated > + properly - defaults to current IP. > + port - port to await a callback ping to let us know that the device has updated properly > + defaults to 30000, and counts up from there if it finds a conflict I know this is how dmSUT works, but should we really document it in the dmADB source if it doesn't do anything? No strong preference, so feel free to disregard. @@ +628,5 @@ > Directive indicates the information you want to get, your choices are: > os - name of the os > id - unique id of the device > uptime - uptime of the device > + uptimemillis - uptime of the device in milliseconds (NOT supported on all implementations) Same comment as above @@ -818,2 @@ > """ > - data = self._runCmd(["uninstall", appName]).stdout.read().strip() Did you mean to remove this? ::: mozdevice/mozdevice/devicemanagerSUT.py @@ +181,4 @@ > try: > sent = self._sock.send(cmdline) > if sent != len(cmdline): > + raise DMError("ERROR: our cmd was %s bytes and we " Will this message get lost? @@ +322,5 @@ > # be deprecated at some point in the future) > self._sendCmds([ { 'cmd': '%s su -c "%s"' % (cmd, cmdline) }], outputfile, > timeout) > + except DMError, e: > + raise DMError("Error running shell command '%s': %s" % (cmdline, e.msg)) traceback @@ +344,5 @@ > + with open(localname, 'rb') as f: > + remoteHash = self._runCmds([{ 'cmd': 'push ' + destname + ' ' + str(filesize), > + 'data': f.read() }]).strip() > + except OSError: > + raise DMError("Error reading file to push") traceback @@ +346,5 @@ > + 'data': f.read() }]).strip() > + except OSError: > + raise DMError("Error reading file to push") > + except DMError, e: > + raise DMError("Error pushing file: %s" % e.msg) Doesn't seem very useful to catch a DMError and immediately raise a new one. @@ +365,5 @@ > + try: > + if not self.dirExists(name): > + self._runCmds([{ 'cmd': 'mkdr ' + name }]) > + except DMError, e: > + raise DMError("Failed to create directory: %s" % e.msg) Same comment as above. @@ +396,5 @@ > + self.pushFile(os.path.join(root, f), remoteName) > + > + except DMError, e: > + raise DMError("Error pushing directory %s. Error: %s" % (remoteDir, > + e.msg)) Same comment. @@ +405,2 @@ > """ > + return self.isDir(dirname) Is there a reason for keeping this other than backwards compatibility? We should at least indicate it's deprecated if not remove it completely. @@ +415,5 @@ > containingpath = '/'.join(s[:-1]) > + try: > + return s[-1] in self.listFiles(containingpath) > + except DMError, e: > + raise DMError("Error determining whether file exists: %s" % e.msg) No point in catch and raising. @@ +428,2 @@ > try: > + if (self.dirExists(rootdir) == False): Change to isDir() @@ +431,3 @@ > data = self._runCmds([{ 'cmd': 'cd ' + rootdir }, { 'cmd': 'ls' }]) > + except DMError, e: > + raise DMError("Error listing files on device: %s" % e.msg) No catch and raise @@ +447,5 @@ > try: > + if self.fileExists(filename): > + self._runCmds([{ 'cmd': 'rm ' + filename }]) > + except DMError, e: > + raise DMError("Error removing file from device: %s" % e.msg) Same comment. I'll stop pointing them out now ;) @@ +456,3 @@ > """ > try: > + if self.dirExists(remoteDir): isDir() @@ +802,3 @@ > > + if not self.dirExists(self.deviceRoot): > + print "MKDIR" Forgot to remove debug statement?
Attachment #666679 - Flags: review?(ahalberstadt)
Attachment #666679 - Flags: review-
Attachment #666679 - Flags: feedback?(jmaher)
Attachment #666679 - Flags: feedback-
Attachment #666679 - Flags: feedback?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #9) > @@ +671,4 @@ > > # prompt should follow > > read_exact(len(prompt), buf, 'could not find prompt') > > # failures are expected, so don't use "Remote Device Error" or we'll RETRY > > + raise DMError("pulling file '%s' unsuccessful: %s" % (remoteFile, error_str)) > > all DMError() calls should have DeviceManager in the front so tbpl can match > the error. TBPL does not match "DeviceManager:", since it's used for information/warnings rather than fatal error cases - and so would show up all the time in the annotated summary, when we don't want it there. That said, I like consistency in the logs, so using "DeviceManager:" as a prefix in general is good IMO.
Comment on attachment 666679 [details] [diff] [review] Patch to mozdevice to make it return exceptions Minusing for the lost "Remote Device Error:" strings.
Attachment #666679 - Flags: feedback?(bmo) → feedback-
Comment on attachment 666685 [details] [diff] [review] Corresponding changes to remote automation Cancelling given jmaher's comments.
Attachment #666685 - Flags: feedback?(bmo)
(In reply to Joel Maher (:jmaher) from comment #9) > Comment on attachment 666679 [details] [diff] [review] > Patch to mozdevice to make it return exceptions > > Review of attachment 666679 [details] [diff] [review]: > ----------------------------------------------------------------- > > a lot of functionality changes that I didn't expect as well as all the work > that edmorley just landed in the last couple weeks to get the tbpl parsers > to work with our error messages. Some of the functionality changes were unavoidable, or were IMO for the better. The tbpl parser stuff was an oversight though. > @@ +237,2 @@ > > except: > > print "pushing " + localDir + " to " + remoteDir + " failed" > > do we need to throw an exception here? We probably should be, fixed. > @@ +289,2 @@ > > """ > > p = self._runCmd(["shell", "ls", "-a", remotePath + '/']) > > hmm, isDir as a name indicates true/false return, but we have removed the > comment indicating what this returns. I fixed this by removing isDir, and replacing all uses of it with dirExists. I documented dirExists the same way python does in the os.path module: "Return True if remotePath is an existing directory on the device." Hopefully that is unambiguous enough for everyone. :) > @@ -527,5 @@ > > - localFile = self._runPull(remoteFile, localFile) > > - > > - if localFile is None: > > - print 'Automation Error: failed to pull file %s!' % remoteFile > > - return None > > the behavior seems different here since we are getting localFile as a return > from _runPull and doing an open() on it below, now we are not doing it? Well, _runPull doesn't really return any output. It's supposed to throw an exception if it doesn't work. > @@ +471,2 @@ > > """ > > + contents = self.pullFile(remoteFile) > > what do we do if contents in None? It shouldn't be None. We should at least have an empty string unless pullFile outright fails (in which case an exception would be thrown). > @@ -554,5 @@ > > > > fhandle = open(localFile, 'wb') > > fhandle.write(contents) > > fhandle.close() > > - return contents > > why are we not returning this anymore? Because you should just use pullFile if you want the file contents. :) We were using this as a sort of return value to indicate success or failure, but we obviously don't need that anymore. I think it's less confusing this way. > @@ -586,5 @@ > > - line = p.stdout.readline() > > - #the last line is a summary > > - if (len(ret) > 0): > > - ret.pop() > > - return ret > > why are we not returning a list of files here anymore? Same as above. > @@ +618,3 @@ > > """ > > timestr = self._runCmd(["shell", "date", "+%s"]).stdout.read().strip() > > if (not timestr or not timestr.isdigit()): > > where do we raise an exception here? This was an oversight. Fixed. > @@ -819,5 @@ > > - data = self._runCmd(["uninstall", appName]).stdout.read().strip() > > - status = data.split('\n')[0].strip() > > - if status == 'Success': > > - return > > - raise DMError("uninstall failed for %s" % appName) > > how will we know this failed? The removal of this was unintentional. Fixed. > @@ -921,5 @@ > > Recursively changes file permissions in a directory > > - > > - returns: > > - success: True > > - failure: False > > where do we remove the return False? where do we throw an error? Per the scope of this bug, the idea is that a command like this should just return None, and throw an exception in case of error. > ::: mozdevice/mozdevice/devicemanagerSUT.py > @@ -371,5 @@ > > - destname = destname + os.path.basename(localname) > > - if (self.validateFile(destname, localname) == True): > > - if (self.debug >= 3): > > - print "files are validated" > > - return True > > we are removing the code that checks the hash before pushing the file. This > is no big deal, but we are sneaking in changes that are unrelated. Yes, this is true. I made this modification so it would be easy to test. > @@ -375,5 @@ > > - return True > > - > > - if self.mkDirs(destname) == None: > > - print "Automation Error: unable to make dirs: " + destname > > - return False > > we are removing the code that makes the root directory structure here, that > changes the function. This should still be happening, just later in the function. > @@ +355,5 @@ > > + localHash = self._getLocalHash(localname) > > + > > + if localHash != remoteHash: > > + raise DMError("Push File failed to Validate! (localhash: %s, " > > + "remotehash: %s)" % (localHash, remoteHash)) > > please keep the 'Automation Error' string in here for tbpl. Fixed. > @@ +405,2 @@ > > """ > > + return self.isDir(dirname) > > why not just rename isDir to dirExists and get rid of the redundancy? > Likewise we are removing the documentation for what we are returning here. Good idea. See my comments on isDir/dirExists above. > @@ -495,5 @@ > > - the device file system > > - > > - returns: > > - success: True > > - failure: False > > we are removing the documentation of what we are returning here. I changed dirExists/fileExists to be more explicit about what they return in the docstring. > @@ -502,5 @@ > > containingpath = '/'.join(s[:-1]) > > - listfiles = self.listFiles(containingpath) > > - for f in listfiles: > > - if (f == s[-1]): > > - return True > > before we checked to see if the file was displayed in the returning results, > now we just return the results? Also we are changing this from a true/false > to returning a string. I think you misread this function. We should still be returning True/False. > @@ +447,5 @@ > > try: > > + if self.fileExists(filename): > > + self._runCmds([{ 'cmd': 'rm ' + filename }]) > > + except DMError, e: > > + raise DMError("Error removing file from device: %s" % e.msg) > > Can we add the 'Automation Error' into the error string? Fixed by removing the exception :) (per ahal's suggestion, which I'll respond to seperately) > @@ -624,3 @@ > > > > - returns: > > - success: output filename > > we do not document what is returned from this function. Fixed. > @@ +590,4 @@ > > # However it means we can't use the response-handling logic in sendCMD(). > > > > def err(error_msg): > > + err_str = 'Error pulling file: %s' % error_msg > > why did we remove the DeviceManager from the err_str? I reverted this change. > @@ +671,4 @@ > > # prompt should follow > > read_exact(len(prompt), buf, 'could not find prompt') > > # failures are expected, so don't use "Remote Device Error" or we'll RETRY > > + raise DMError("pulling file '%s' unsuccessful: %s" % (remoteFile, error_str)) > > all DMError() calls should have DeviceManager in the front so tbpl can match > the error. Really? That's not consistent with what was in the file previously. I changed the above to "automation error". > @@ +696,3 @@ > > > > fhandle = open(localFile, 'wb') > > + fhandle.write(data) > > what if data is None? we will generate a TypeError and it will be confusing > to debug. data should not be None. We get at least an empty string (we throw an exception on failure). > @@ +707,4 @@ > > """ > > if (self.debug >= 2): > > print "getting files in '" + remoteDir + "'" > > + if checkDir and not self.isDir(remoteDir): > > isDir returns false? Fixed by changing to dirExists > @@ -893,5 @@ > > Checks if remotePath is a directory on the device > > - > > - returns: > > - success: True > > - failure: False > > please retain some form of documentation on the return values here. Done, see comments above. > @@ -914,5 @@ > > Checks if the remoteFile has the same md5 hash as the localFile > > - > > - returns: > > - success: True > > - failure: False > > why are we are losing these return values? I changed this to: Returns True if remoteFile has the same md5 hash as the localFile It's good to document return values, but the old style was way more verbose than necessary. > @@ +802,3 @@ > > > > + if not self.dirExists(self.deviceRoot): > > + print "MKDIR" > > extra debugging left in? Yup. :) > @@ -1104,3 @@ > > > > - if (data is None): > > - continue > > we are removing this check which could cause other errors, is there a > specific reason? Data should never be None. We should at least have the empty string. > @@ +942,5 @@ > > > > try: > > data = self._runCmds([{ 'cmd': cmd }]) > > + except DMError, e: > > + raise DMError("Error installing app: %s" % e.msg) > > we lost the 'remote device error' string in the error message. Fixed. > @@ +962,5 @@ > > cmd += ' ' + installPath > > try: > > data = self._runCmds([{ 'cmd': cmd }]) > > + except DMError, e: > > + raise DMError("Error uninstalling %s: %s" % (appName, e.msg)) > > again, we lost the remote device error Fixed by removing the exception wrapping.
(In reply to Andrew Halberstadt [:ahal] from comment #10) > Comment on attachment 666679 [details] [diff] [review] > Patch to mozdevice to make it return exceptions > > Review of attachment 666679 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm r-'ing because I think we should try and preserve the original traceback > whenever we can. I don't know if there's a way to pass it along to a new > raise call, but at the very least I think we should be printing it at a > certain debug level. I'm also not a fan of the except DMError: raise DMError > pattern. Though, if you really want the new message I'd be fine with just > printing the original traceback. There's also a few other misc nits. > > Anyway, this patch is really awesome, can't wait to get it landed. Thanks! > > ::: mozdevice/mozdevice/devicemanagerADB.py > @@ +177,4 @@ > > destname = destname + "/" + os.path.basename(localname) > > return True > > except: > > + raise DMError("Error pushing file to device") > > So I think we should try to preserve the original traceback wherever > possible. I'm not sure how to do it with custom exceptions otoh, but maybe > we could at least print the original traceback to a certain log level: > import traceback > if self.debug >= 3: > traceback.print_exc() > > Or letting the exception propagate would be easier (assuming it doesn't > break the buildbot code). As I understand things it's ok to let the exceptions propagate, as long as we have tbpl-friendly exception names in them (e.g. "Automation Error", "Remote Automation Error"). I removed the reraising of exceptions and made sure that we had these strings in everything we're raising. > @@ +607,5 @@ > > + destPath - Destination directory to where the application should be installed (optional) > > + ipAddr - IP address to await a callback ping to let us know that the device has updated > > + properly - defaults to current IP. > > + port - port to await a callback ping to let us know that the device has updated properly > > + defaults to 30000, and counts up from there if it finds a conflict > > I know this is how dmSUT works, but should we really document it in the > dmADB source if it doesn't do anything? No strong preference, so feel free > to disregard. Yeah, this is pretty awful. I'd rather file a followup bug to fix this though, as the changes here are already larger in scope than I'd like. > @@ +628,5 @@ > > Directive indicates the information you want to get, your choices are: > > os - name of the os > > id - unique id of the device > > uptime - uptime of the device > > + uptimemillis - uptime of the device in milliseconds (NOT supported on all implementations) > > Same comment as above > > @@ -818,2 @@ > > """ > > - data = self._runCmd(["uninstall", appName]).stdout.read().strip() > > Did you mean to remove this? Nope. :) Fixed. > ::: mozdevice/mozdevice/devicemanagerSUT.py > @@ +181,4 @@ > > try: > > sent = self._sock.send(cmdline) > > if sent != len(cmdline): > > + raise DMError("ERROR: our cmd was %s bytes and we " > > Will this message get lost? Hmm, it shouldn't. I fixed it to say "Remote Device Error" though. > @@ +405,2 @@ > > """ > > + return self.isDir(dirname) > > Is there a reason for keeping this other than backwards compatibility? We > should at least indicate it's deprecated if not remove it completely. It looks like we were pretty much always using dirExists in our existing code (and were using isDir only internally). I decided to keep that one instead, and removed isDir. However, isDir did have a better internal implementation (faster/simpler) so we're now using the guts of it with dirExists.
(In reply to Joel Maher (:jmaher) from comment #8) > Comment on attachment 666685 [details] [diff] [review] > Corresponding changes to remote automation > > Review of attachment 666685 [details] [diff] [review]: > ----------------------------------------------------------------- > > 1) we are making changes to b2gautomation.py which remove shell and switch > to _checkCmd. That should be in a different patch. > 2) we currently have specific error messages to indicate where we failed in > the automation, the changes in this patch just bubble up a pushfile failure, > which doesn't help. > > ::: build/mobile/remoteautomation.py > @@ +157,4 @@ > > > > @property > > def pid(self): > > + return self.dm.processExist(self.procName) > > I really don't like this. We have scenarios where fennec doesn't start up > for a long time (or at least in the past) which mean we need to return 0 > instead of throw an exception. Well, we should only throw an exception if there is a genuine problem. Otherwise, we'll return either the pid or None. If you like I could return 0 in the case we don't have a process though. > ::: testing/mochitest/runtestsremote.py > @@ -308,5 @@ > > return "NO_CHROME_ON_DROID" > > path = '/'.join(parts[:-1]) > > manifest = path + "/chrome/" + os.path.basename(filename) > > - if self._dm.pushFile(filename, manifest) == False: > > - raise devicemanager.FileError("Unable to install Chrome files on device.") > > my biggest concern here is we are going to be outputting a different error > message by just letting the exception from pushFile bubble up. This is true. What would you prefer we do instead though? I guess I could catch the exception and print that out as something like "Automation Error: Unable to install Chrome files on device.". Would that be acceptable?
(In reply to William Lachance (:wlach) from comment #14) > > @@ +671,4 @@ > > > # prompt should follow > > > read_exact(len(prompt), buf, 'could not find prompt') > > > # failures are expected, so don't use "Remote Device Error" or we'll RETRY > > > + raise DMError("pulling file '%s' unsuccessful: %s" % (remoteFile, error_str)) > > > > all DMError() calls should have DeviceManager in the front so tbpl can match > > the error. > > Really? That's not consistent with what was in the file previously. I > changed the above to "automation error". Please don't change it to that. The magic strings "Automation Error:" and "Remote Device Error:" should only be used for errors that will abort the run; otherwise TBPL's summary will be full of false positives (and in the case of "Remote Device Error:" buildbot will retry the whole run even if it would otherwise have been successful). At the moment we get loads of "pulling foo unsuccessful:" (and jmaher said this error is actually expected/more of a warning), so it shouldn't use either of those error strings.
For the two cases where I mentioned we don't check for None, I would really like to see us check for it for the sole reason getting a typeError in a tbpl log will most likely be confusing and not help us get to the root cause. Here is one of the two examples, followed by the response: > @@ +696,3 @@ > > > > fhandle = open(localFile, 'wb') > > + fhandle.write(data) > > what if data is None? we will generate a TypeError and it will be confusing > to debug. data should not be None. We get at least an empty string (we throw an exception on failure). -------- everything else is looking good. Big changes for little dm.
(In reply to Ed Morley [:edmorley UTC+1] from comment #17) > (In reply to William Lachance (:wlach) from comment #14) > > > @@ +671,4 @@ > > > > # prompt should follow > > > > read_exact(len(prompt), buf, 'could not find prompt') > > > > # failures are expected, so don't use "Remote Device Error" or we'll RETRY > > > > + raise DMError("pulling file '%s' unsuccessful: %s" % (remoteFile, error_str)) > > > > > > all DMError() calls should have DeviceManager in the front so tbpl can match > > > the error. > > > > Really? That's not consistent with what was in the file previously. I > > changed the above to "automation error". > > Please don't change it to that. The magic strings "Automation Error:" and > "Remote Device Error:" should only be used for errors that will abort the > run; otherwise TBPL's summary will be full of false positives (and in the > case of "Remote Device Error:" buildbot will retry the whole run even if it > would otherwise have been successful). At the moment we get loads of > "pulling foo unsuccessful:" (and jmaher said this error is actually > expected/more of a warning), so it shouldn't use either of those error > strings. Fair enough, I'll fix. To be honest though, I think mozdevice is really the wrong place to be communicating stuff to tbpl. Many many things outside of tbpl automation use it, at some point someone's bound to unwittingly make some kind of change to it which is either going to obscure or inappropriately highlight an error. I think we'd do much better at making sure that the actual scripts/libraries we use in mozilla-central (remoteautomation.py et al) catch errors where appropriate and print out the right kinds of messages. That way we only have to worry about making sure that mozdevice internally consistent with its documentation.
(In reply to William Lachance (:wlach) from comment #19) > I think we'd do much better at making sure that the actual scripts/libraries > we use in mozilla-central (remoteautomation.py et al) catch errors where > appropriate and print out the right kinds of messages. That way we only have > to worry about making sure that mozdevice internally consistent with its > documentation. We can switch to whichever strings you like, if that helps? mozdevice-error mozdevice-warning mozdevice-info ...etc? It just seems at the moment the problem is that mozdevice isn't even consistent. ie: why are we raising a DM*Error* for an expected warning/info case?
(In reply to Ed Morley [:edmorley UTC+1] from comment #20) > (In reply to William Lachance (:wlach) from comment #19) > > I think we'd do much better at making sure that the actual scripts/libraries > > we use in mozilla-central (remoteautomation.py et al) catch errors where > > appropriate and print out the right kinds of messages. That way we only have > > to worry about making sure that mozdevice internally consistent with its > > documentation. > > We can switch to whichever strings you like, if that helps? > mozdevice-error > mozdevice-warning > mozdevice-info > ...etc? > > It just seems at the moment the problem is that mozdevice isn't even > consistent. > > ie: why are we raising a DM*Error* for an expected warning/info case? Exceptions in python are a pretty standard way of indicating failure (and their names usually have error in them). Even something as simple as deleting a file can raise an OSError if it doesn't exist. That said, even with this patch, we try pretty darn hard to make things work (multiple retries, etc.) in most cases before throwing an exception and are generally pretty forgiving about the input we accept for commands (e.g. mkDir will quietly check if a directory exists before trying to create it, and silently return if it is). If there's a way we can make mozdevice better for its use with tbpl I'm definitely open to suggestions, just that I think there's no substitute for using it carefully in client code and generating the errors you want to see from there if there's a problem. (obviously that's outside the scope of this bug)
(In reply to William Lachance (:wlach) from comment #21) > Exceptions in python are a pretty standard way of indicating failure Fair enough; Python noob here not yet used to the Python way :-) (It just seemed drastic to raise an exception for a common 'failure-that-we're-happy-to-retry-and-ignore' case)
To avoid the need to re-read the entire patch, I'm just attaching the changes for review. Should you want to look at the whole thing in its entirety, check out: http://people.mozilla.com/~wlachance/0001-Bug-796863-Making-getting-recording-logcat-not-requi.patch
Attachment #667476 - Flags: review?(ahalberstadt)
Attachment #667476 - Flags: feedback?(jmaher)
(In reply to William Lachance (:wlach) from comment #16) > (In reply to Joel Maher (:jmaher) from comment #8) > > Comment on attachment 666685 [details] [diff] [review] > > Corresponding changes to remote automation > > > > Review of attachment 666685 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > 1) we are making changes to b2gautomation.py which remove shell and switch > > to _checkCmd. That should be in a different patch. > > 2) we currently have specific error messages to indicate where we failed in > > the automation, the changes in this patch just bubble up a pushfile failure, > > which doesn't help. > > > > ::: build/mobile/remoteautomation.py > > @@ +157,4 @@ > > > > > > @property > > > def pid(self): > > > + return self.dm.processExist(self.procName) > > > > I really don't like this. We have scenarios where fennec doesn't start up > > for a long time (or at least in the past) which mean we need to return 0 > > instead of throw an exception. > > Well, we should only throw an exception if there is a genuine problem. > Otherwise, we'll return either the pid or None. If you like I could return 0 > in the case we don't have a process though. I would prefer to return a 0, until we can have a reliable way to determine if the process is running and query the status of it we should continue this, how about a comment that this should change in the future? > > > ::: testing/mochitest/runtestsremote.py > > @@ -308,5 @@ > > > return "NO_CHROME_ON_DROID" > > > path = '/'.join(parts[:-1]) > > > manifest = path + "/chrome/" + os.path.basename(filename) > > > - if self._dm.pushFile(filename, manifest) == False: > > > - raise devicemanager.FileError("Unable to install Chrome files on device.") > > > > my biggest concern here is we are going to be outputting a different error > > message by just letting the exception from pushFile bubble up. > > This is true. What would you prefer we do instead though? I guess I could > catch the exception and print that out as something like "Automation Error: > Unable to install Chrome files on device.". Would that be acceptable? We probably don't need as much details as possible, but it would be really nice to know that we had an 'Automation Error: failure to copy profile files to the device' or something like that. Having a random exception does us no good for using the logfiles to fix problems.
Comment on attachment 667476 [details] [diff] [review] Patch on patch which address concerns Review of attachment 667476 [details] [diff] [review]: ----------------------------------------------------------------- Looks good except for an indentation error and the dangling isDir function definition. r-'ing for those reasons but I'll change to an r+ if you comment saying you addressed them and the few nits. ::: mozdevice/mozdevice/devicemanagerADB.py @@ +663,5 @@ > + data = self._runCmd(["uninstall", appName]).stdout.read().strip() > + status = data.split('\n')[0].strip() > + if status == 'Success': > + return > + raise DMError("uninstall failed for %s. Got: %s" % (appName, status)) nit: if status != 'Success': raise DMError(..) ::: mozdevice/mozdevice/devicemanagerSUT.py @@ +375,1 @@ > if (parts[1] == ""): I think the indentation got messed up here. @@ +389,4 @@ > """ > + ret = self._runCmds([{ 'cmd': 'isdir ' + remotePath }]).strip() > + if not ret: > + raise DMError('Automation Error: DeviceManager isdir returned null') nit: string should say dirExists @@ +679,5 @@ > self.getDirectory(remotePath, localPath, False) > else: > self.getFile(remotePath, localPath) > > def isDir(self, remotePath): Forgot to remove this?
Attachment #667476 - Flags: review?(ahalberstadt) → review-
Agree with all comments except one, discussed below: (In reply to Andrew Halberstadt [:ahal] from comment #25) > @@ +389,4 @@ > > """ > > + ret = self._runCmds([{ 'cmd': 'isdir ' + remotePath }]).strip() > > + if not ret: > > + raise DMError('Automation Error: DeviceManager isdir returned null') > > nit: string should say dirExists We're actually referring to the internal command here. IMO we should keep this as-is (it's not a great error message but at least it's no worse than before) If that's ok I'll do up one last patch and throw it against try.
Attached patch Updated patch against m-c (obsolete) — Splinter Review
Addressing review concerns
Attachment #666685 - Attachment is obsolete: true
Attachment #666685 - Flags: feedback?(ahalberstadt)
Attachment #667520 - Flags: review?(jmaher)
Attachment #667520 - Flags: review?(jmaher) → review+
Comment on attachment 667476 [details] [diff] [review] Patch on patch which address concerns Review of attachment 667476 [details] [diff] [review]: ----------------------------------------------------------------- after discussing in irc to throw new exceptions or pass along existing exceptions, I believe this code will be fine assuming you resolve the nit's from ahal. Thanks for moving mozdevice into a more robust and reusable state! ::: mozdevice/mozdevice/devicemanagerADB.py @@ +199,5 @@ > # contains symbolic links, the links are pushed, rather than the linked > # files; we either zip/unzip or push file-by-file to get around this > # limitation > + if (not self.dirExists(remoteDir)): > + self.mkDirs(remoteDir+"/x") why the "/x" ?
Attachment #667476 - Flags: feedback?(jmaher) → feedback+
(In reply to Joel Maher (:jmaher) from comment #28) > Comment on attachment 667476 [details] [diff] [review] > Patch on patch which address concerns > > Review of attachment 667476 [details] [diff] [review]: > ----------------------------------------------------------------- > > after discussing in irc to throw new exceptions or pass along existing > exceptions, I believe this code will be fine assuming you resolve the nit's > from ahal. > > Thanks for moving mozdevice into a more robust and reusable state! > > ::: mozdevice/mozdevice/devicemanagerADB.py > @@ +199,5 @@ > > # contains symbolic links, the links are pushed, rather than the linked > > # files; we either zip/unzip or push file-by-file to get around this > > # limitation > > + if (not self.dirExists(remoteDir)): > > + self.mkDirs(remoteDir+"/x") > > why the "/x" ? I think that's just because mkDirs only creates directories immediately below the last element (I guess because the last element is assumed to be a path). It's all very confusing.
(In reply to William Lachance (:wlach) from comment #30) > I think that's just because mkDirs only creates directories immediately > below the last element (I guess because the last element is assumed to be a > path). It's all very confusing. Assumed to be a file, I mean.
Here's what I'd like to push against m-c, it addresses all your comments from yesterday except for the one I mentioned, and fixes a few problems I found with undefined types when running against try. (successful tryserver run here: https://tbpl.mozilla.org/?tree=Try&rev=02bc11d1e134) (unfortunately I need to push this directly against m-c so as not to break anything -- I'll update mozbase at the same time if I can)
Attachment #666679 - Attachment is obsolete: true
Attachment #667476 - Attachment is obsolete: true
Attachment #667520 - Attachment is obsolete: true
Attachment #667942 - Flags: review?(ahalberstadt)
Comment on attachment 667942 [details] [diff] [review] Final patch against m-c Review of attachment 667942 [details] [diff] [review]: ----------------------------------------------------------------- LGTM, thanks for doing this! Although I don't quite understand why landing this on inbound first would break things.
Attachment #667942 - Flags: review?(ahalberstadt) → review+
I meant m-i. :) Inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/6e34c2118a92 (In reply to Andrew Halberstadt [:ahal] from comment #33) > Comment on attachment 667942 [details] [diff] [review] > Final patch against m-c > > Review of attachment 667942 [details] [diff] [review]: > ----------------------------------------------------------------- > > LGTM, thanks for doing this! Although I don't quite understand why landing > this on inbound first would break things.
And here's the mozbase commit: https://github.com/mozilla/mozbase/commit/f2cfb5e1ee6dc50449d222da1246ca1e39bfac14 I needed to do some updates to some of the tests to work properly (reviewed by ahal)
Depends on: 797996
It appears that reboot() will no longer indicate any sort of failure, even if the device never calls back. Am I understanding this correctly? If so, it should be fixed (I can file another bug).
(In reply to Mark Côté ( :mcote ) from comment #36) > It appears that reboot() will no longer indicate any sort of failure, even > if the device never calls back. Am I understanding this correctly? If so, it > should be fixed (I can file another bug). Yes, that does appear to be a problem with the patch that we didn't notice. I filed the follow up bug 804029 to deal with that. This one's already left the station.
(In reply to William Lachance (:wlach) from comment #34) > I meant m-i. :) > > Inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/6e34c2118a92 > > (In reply to Andrew Halberstadt [:ahal] from comment #33) > > Comment on attachment 667942 [details] [diff] [review] > > Final patch against m-c > > > > Review of attachment 667942 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > LGTM, thanks for doing this! Although I don't quite understand why landing > > this on inbound first would break things. I've just spotted this had a wrong bug number; which is why the merge to m-c didn't get marked here. I've commented in the bug in case anyone else ends up there when looking at hg annotate. https://hg.mozilla.org/mozilla-central/rev/6e34c2118a92
What is the status of this bug? I had thought this had been closed awhile ago.
Indeed, this was pushed ages ago. Resolving.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: