Closed Bug 780031 Opened 12 years ago Closed 12 years ago

Mochitest on B2G reliability improvements

Categories

(Testing :: Mochitest, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla17

People

(Reporter: onecyrenus, Assigned: jgriffin)

References

Details

(Whiteboard: u=b2g c=ateam p=1)

Attachments

(2 files, 1 obsolete file)

Attached patch Diff For series of patches (obsolete) — Splinter Review
There are several reliability enhancements that I would like to get in. 

#1) When we reboot a device we have a sleep (40) seconds to wait for the wlan to come up. I have replaced this with a loop to check when the networking interface has an IP address. This works 100% 

#2) Changed the way we push files onto the device in that we never push new files onto the device if a .orig file exists. 
--- This manifested itself in sometimes overwriting the original file.  If you interrupted the previous run. 

#3) B2GInstance now actually has b2g running in a process in a separate thread. 
This gives us access to test output results, whereas before they were hidden. 

#4) We run cleanup before each run, and after.  The idea here is to make sure we cleanup any terminated runs before we start a new one. 

#5) self.runCmd is watched by watchProcess, and if there is no response after 15 seconds the process is terminated, and a retry attempted. This puts the onus on the programmer as to the number of times you run a command before you give up. (only meh on this implementation, but it does seem to work )
Blocks: 771725
Attachment #648550 - Attachment is patch: true
Attachment #648550 - Flags: review?(jgriffin)
Thanks for the review !
This is just review = ?.  I'll make sure this works with the emulator before giving it r+.
Comment on attachment 648550 [details] [diff] [review]
Diff For series of patches

Review of attachment 648550 [details] [diff] [review]:
-----------------------------------------------------------------

In general, I like this approach, but it totally breaks testing on the emulator.  

One issue is that for some reason you cannot start B2G on the emulator using 'adb shell /system/b2g/b2g'; you get this error:

"link_image[1936]:   448 could not load needed library 'libmozglue.so' for './b2g' (load_library[1091]: Library 'libmozglue.so' not found)CANNOT LINK EXECUTABLE"

This can be fixed by adding LD_LIBRARY_PATH=/system/b2g:%LD_LIBRARY_PATH% prior to the 'shell /system/b2g' command.

After that, there a bunch of timing related problems on the emulator that this patch causes.  If you'll fix the items below, I'll masssage it so it works on the emulator as well.

Regarding your changes to devicemanagerADB.py, what problem is that solving?  Are you seeing cases where a command is getting stuck?  I don't see any places in this patch where we watch for that and retry the command.  If these changes are still needed, please break them into a separate patch (since it can potentially impact more than just B2G testing, and we'll need to get them reviewed by more people).

::: build/mobile/b2gautomation.py
@@ +76,4 @@
>          env['MOZ_HIDE_RESULTS_TABLE'] = '1'
>          return env
>  
> +    def waitForWlan(self): 

This breaks on the emulator and the panda, since there is no wlan0 interface, but a eth0 interface instead.  At a minimum, we should use 'netcfg' to see what interfaces there are, and pick either wlan0 or eth0, but even this seems fragile.

Additionally, this will loop forever if the network never comes up.  There should be a timeout here to prevent that.

::: build/mobile/devicemanagerADB.py
@@ +702,5 @@
> +        print "attempting to terminate"
> +        # in the case the process did not complete, we kill it
> +        process.terminate()
> +        # and fill the return code with some error value
> +        returncode = -1  # (comment 2)

What do (comment 1) and (comment 2) refer to?  Should probably be removed.

@@ +720,4 @@
>        args.insert(1, "run-as")
>        args.insert(2, self.packageName)
>      finalArgs.extend(args)
> +    return self.watchProcess(15, subprocess.Popen(finalArgs, stdout=subprocess.PIPE, stderr=subprocess.STDOUT))

We shouldn't hardcode 15, I think.  We probably want to preserve the original behavior of having no timeout, but allow callers to specify one if they want it.

@@ +737,4 @@
>      if (not self.haveRoot and self.useRunAs and args[0] == "shell" and args[1] != "run-as"):
>        args.insert(1, "run-as")
>        args.insert(2, self.packageName)
> +      

Please be careful about changing lines just to add unnecessary spaces, like here.

::: testing/mochitest/runtestsb2g.py
@@ +235,5 @@
> +                    self._dm.checkCmdAs(['shell', 'cp', '%s.orig' % self.remoteProfilesIniPath, self.remoteProfilesIniPath])
> +
> +                # Remove the backup copy
> +                self._dm.checkCmdAs(['shell', 'rm', '%s.orig' % self.userJS])
> +                self._dm.checkCmdAs(['shell', 'rm', '%s.orig' % self.remoteProfilesIniPath])

should just use self._dm.removeFile() here

@@ +450,3 @@
>      retVal = 1
>      try:
> +        mochitest.cleanup(None, options)

calling cleanup here will result in an unnecessary reboot; should prevent that
Attachment #648550 - Flags: review?(jgriffin) → review-
removed watch process, added / removed extraneous empty lines, made removing profile directory one command instead off using deviceManager's recursive delete.
waitForNet now returns fals if a network is not active within 40 seconds.
Attachment #648550 - Attachment is obsolete: true
Attachment #649466 - Flags: review?
Attachment #649466 - Flags: review? → review?(jgriffin)
Comment on attachment 649466 [details] [diff] [review]
patch with above fixes

>diff --git a/build/mobile/b2gautomation.py b/build/mobile/b2gautomation.py
>index 55e98fc..a819bc4 100644
>--- a/build/mobile/b2gautomation.py
>+++ b/build/mobile/b2gautomation.py
>@@ -18,8 +18,8 @@ from devicemanager import DeviceManager, NetworkTools
> from mozprocess import ProcessHandlerMixin
> 
> 
>-class LogcatProc(ProcessHandlerMixin):
>-    """Process handler for logcat which puts all output in a Queue.
>+class StdOutProc(ProcessHandlerMixin):
>+    """Process handler for b2g which puts all output in a Queue.
>     """
> 
>     def __init__(self, cmd, queue, **kwargs):
>@@ -76,6 +76,20 @@ class B2GRemoteAutomation(Automation):
>         env['MOZ_HIDE_RESULTS_TABLE'] = '1'
>         return env
> 
>+    def waitForNet(self): 
>+        active = False
>+        time_out = 0 
>+        while not active and time_out < 40: 
>+            data = self._devicemanager.runCmd(['shell', '/system/bin/netcfg']).stdout.readlines()
>+            data.pop(0)
>+            for line in data:
>+                if (re.search(r'UP\s+(?:[0-9]{1,3}\.){3}[0-9]{1,3}', line)):
>+                    active = True
>+                    break
>+            time_out += 1
>+            time.sleep(1) 
>+        return active      
>+
>     def checkForCrashes(self, directory, symbolsPath):
>         # XXX: This will have to be updated after crash reporting on b2g
>         # is in place.
>@@ -160,7 +174,7 @@ class B2GRemoteAutomation(Automation):
>         serial, status = self.getDeviceStatus()
> 
>         # reboot!
>-        self._devicemanager.checkCmd(['reboot'])
>+        self._devicemanager.runCmd(['shell', '/system/bin/reboot'])
> 
>         # wait for device to come back to previous status
>         print 'waiting for device to come back online after reboot'
>@@ -182,9 +196,6 @@ class B2GRemoteAutomation(Automation):
>         # test url using Marionette.  There doesn't seem to be any way
>         # to pass env variables into the B2G process, but this doesn't 
>         # seem to matter.
>-
>-        instance = self.B2GInstance(self._devicemanager)
>-
>         # reboot device so it starts up with the mochitest profile
>         # XXX:  We could potentially use 'stop b2g' + 'start b2g' to achieve
>         # a similar effect; will see which is more stable while attempting
>@@ -193,11 +204,20 @@ class B2GRemoteAutomation(Automation):
>             self.restartB2G()
>         else:
>             self.rebootDevice()
>+        time.sleep(5)
>+        # wait for wlan to come up 
>+        if not self.waitForNet():
>+          raise Exception("network did not come up, please configure the network" + 
>+                          " prior to running before running the automation framework")        
>+
>+        # stop b2g
>+        self._devicemanager.runCmd(['shell', 'stop', 'b2g'])
>+        time.sleep(2)
> 
>-        # Infrequently, gecko comes up before networking does, so wait a little
>-        # bit to give the network time to become available.
>-        # XXX:  need a more robust mechanism for this
>-        time.sleep(40)
>+        # relaunch b2g inside b2g instance
>+        instance = self.B2GInstance(self._devicemanager)
>+        
>+        time.sleep(5);
> 
>         # Set up port forwarding again for Marionette, since any that
>         # existed previously got wiped out by the reboot.
>@@ -205,6 +225,8 @@ class B2GRemoteAutomation(Automation):
>             self._devicemanager.checkCmd(['forward',
>                                           'tcp:%s' % self.marionette.port,
>                                           'tcp:%s' % self.marionette.port])
>+        
>+        time.sleep(2)
> 
>         # start a marionette session
>         session = self.marionette.start_session()
>@@ -228,26 +250,28 @@ class B2GRemoteAutomation(Automation):
> 
>         def __init__(self, dm):
>             self.dm = dm
>-            self.logcat_proc = None
>+            self.stdout_proc = None
>             self.queue = Queue.Queue()
> 
>-            # Launch logcat in a separate thread, and dump all output lines
>+            # Launch b2g in a separate thread, and dump all output lines
>             # into a queue.  The lines in this queue are
>             # retrieved and returned by accessing the stdout property of
>             # this class.
>             cmd = [self.dm.adbPath]
>             if self.dm.deviceSerial:
>                 cmd.extend(['-s', self.dm.deviceSerial])
>-            cmd.append('logcat')
>-            proc = threading.Thread(target=self._save_logcat_proc, args=(cmd, self.queue))
>+            cmd.append('shell')
>+            cmd.append('LD_LIBRARY_PATH=/system/b2g:%LD_LIBRARY_PATH%') 
>+            cmd.append('/system/b2g/b2g')
>+            proc = threading.Thread(target=self._save_stdout_proc, args=(cmd, self.queue))
>             proc.daemon = True
>             proc.start()
> 
>-        def _save_logcat_proc(self, cmd, queue):
>-            self.logcat_proc = LogcatProc(cmd, queue)
>-            self.logcat_proc.run()
>-            self.logcat_proc.waitForFinish()
>-            self.logcat_proc = None
>+        def _save_stdout_proc(self, cmd, queue):
>+            self.stdout_proc = StdOutProc(cmd, queue)
>+            self.stdout_proc.run()
>+            self.stdout_proc.waitForFinish()
>+            self.stdout_proc = None
> 
>         @property
>         def pid(self):
>@@ -257,7 +281,7 @@ class B2GRemoteAutomation(Automation):
>         @property
>         def stdout(self):
>             # Return any lines in the queue used by the
>-            # logcat process handler.
>+            # b2g process handler.
>             lines = []
>             while True:
>                 try:
>diff --git a/build/mobile/devicemanagerADB.py b/build/mobile/devicemanagerADB.py
>index 779c771..6656341 100644
>--- a/build/mobile/devicemanagerADB.py
>+++ b/build/mobile/devicemanagerADB.py
>@@ -617,7 +617,7 @@ class DeviceManagerADB(DeviceManager):
>   #  success: status from test agent
>   #  failure: None
>   def reboot(self, wait = False):
>-    ret = self.runCmd(["reboot"]).stdout.read()
>+    ret = self.runCmd(['shell', '/system/bin/reboot']).stdout.read()
>     if (not wait):
>       return "Success"
>     countdown = 40
>diff --git a/testing/mochitest/runtestsb2g.py b/testing/mochitest/runtestsb2g.py
>index 111dd126..2cde2a8 100644
>--- a/testing/mochitest/runtestsb2g.py
>+++ b/testing/mochitest/runtestsb2g.py
>@@ -194,32 +194,36 @@ class B2GMochitest(Mochitest):
>         self.remoteProfile = options.remoteTestRoot + '/profile'
>         self._automation.setRemoteProfile(self.remoteProfile)
>         self.remoteLog = options.remoteLogFile
>+        self.localLog = None
>         self.userJS = '/data/local/user.js'
>         self.remoteMozillaPath = '/data/b2g/mozilla'
>         self.remoteProfilesIniPath = os.path.join(self.remoteMozillaPath, 'profiles.ini')
>         self.originalProfilesIni = None
>+    
>+    def origUserJSExists(self):
>+        return self._dm.fileExists('/data/local/user.js.orig')
> 
>     def cleanup(self, manifest, options):
>-        # Restore the original profiles.ini.
>-        if self.originalProfilesIni:
>-            try:
>-                if not options.emulator:
>-                    self.restoreProfilesIni()
>-                os.remove(self.originalProfilesIni)
>-            except:
>-                pass
>-
>-        if not options.emulator:
>+        if self.localLog:
>             self._dm.getFile(self.remoteLog, self.localLog)
>             self._dm.removeFile(self.remoteLog)
>-            self._dm.removeDir(self.remoteProfile)
>+            self._dm.checkCmdAs(['shell', 'rm', '-r', self.remoteProfile])
>+            
>+        if  self.origUserJSExists():
>+            # Restore the original user.js and Profile.ini
>+            self._dm.removeFile(self.userJS)
>+            self._dm.removeFile(self.remoteProfilesIniPath)
> 
>-            # Restore the original user.js.
>-            self._dm.checkCmdAs(['shell', 'rm', '-f', self.userJS])
>             if self._dm.useDDCopy:
>                 self._dm.checkCmdAs(['shell', 'dd', 'if=%s.orig' % self.userJS, 'of=%s' % self.userJS])
>+                self._dm.checkCmdAs(['shell', 'dd', 'if=%s.orig' % self.remoteProfilesIniPath, 'of=%s' % self.remoteProfilesIniPath])
>             else:
>                 self._dm.checkCmdAs(['shell', 'cp', '%s.orig' % self.userJS, self.userJS])
>+                self._dm.checkCmdAs(['shell', 'cp', '%s.orig' % self.remoteProfilesIniPath, self.remoteProfilesIniPath])
>+
>+            # Remove the backup copy
>+            self._dm.removeFile("%s.orig" % self.userJS)
>+            self._dm.removeFile("%s.orig" % self.remoteProfilesIniPath)
> 
>             # We've restored the original profile, so reboot the device so that
>             # it gets picked up.
>@@ -313,14 +317,6 @@ class B2GMochitest(Mochitest):
>         options.profilePath = self.remoteProfile
>         return manifest
> 
>-    def restoreProfilesIni(self):
>-        # restore profiles.ini on the device to its previous state
>-        if not self.originalProfilesIni or not os.access(self.originalProfilesIni, os.F_OK):
>-            raise DMError('Unable to install original profiles.ini; file not found: %s',
>-                          self.originalProfilesIni)
>-
>-        self._dm.pushFile(self.originalProfilesIni, self.remoteProfilesIniPath)
>-
>     def updateProfilesIni(self, profilePath):
>         # update profiles.ini on the device to point to the test profile
>         self.originalProfilesIni = tempfile.mktemp()
>@@ -338,8 +334,11 @@ class B2GMochitest(Mochitest):
>             config.write(configfile)
> 
>         self._dm.pushFile(newProfilesIni, self.remoteProfilesIniPath)
>+        self._dm.pushFile(self.originalProfilesIni, '%s.orig' % self.remoteProfilesIniPath)
>+
>         try:
>             os.remove(newProfilesIni)
>+            os.remove(self.originalProfilesIni)
>         except:
>             pass
> 
>@@ -369,21 +368,19 @@ user_pref("network.dns.localDomains","app://system.gaiamobile.org");\n
>         f.close()
> 
>         # Copy the profile to the device.
>-        self._dm.removeDir(self.remoteProfile)
>+        self._dm.checkCmdAs(['shell', 'rm', '-r', self.remoteProfile])
>         if self._dm.pushDir(options.profilePath, self.remoteProfile) == None:
>             raise devicemanager.FileError("Unable to copy profile to device.")
> 
>         # In B2G, user.js is always read from /data/local, not the profile
>         # directory.  Backup the original user.js first so we can restore it.
>-        self._dm.checkCmdAs(['shell', 'rm', '-f', '%s.orig' % self.userJS])
>-        if self._dm.useDDCopy:
>-            self._dm.checkCmdAs(['shell', 'dd', 'if=%s' % self.userJS, 'of=%s.orig' % self.userJS])
>-        else:
>-            self._dm.checkCmdAs(['shell', 'cp', self.userJS, '%s.orig' % self.userJS])
>-        self._dm.pushFile(os.path.join(options.profilePath, "user.js"), self.userJS)
>-
>-        self.updateProfilesIni(self.remoteProfile)
>-
>+        if (self._dm.fileExists('%s.orig' % self.userJS) is not True):
>+          if self._dm.useDDCopy:
>+              self._dm.checkCmdAs(['shell', 'dd', 'if=%s' % self.userJS, 'of=%s.orig' % self.userJS])
>+          else:
>+              self._dm.checkCmdAs(['shell', 'cp', self.userJS, '%s.orig' % self.userJS])
>+          self._dm.pushFile(os.path.join(options.profilePath, "user.js"), self.userJS)
>+          self.updateProfilesIni(self.remoteProfile)
>         options.profilePath = self.remoteProfile
>         options.logFile = self.localLog
>         return retVal
>@@ -419,7 +416,6 @@ def main():
>         kwargs.update({'host': options.deviceIP,
>                        'port': options.devicePort})
>     dm = devicemanagerADB.DeviceManagerADB(**kwargs)
>-
>     auto.setDeviceManager(dm)
>     options = parser.verifyRemoteOptions(options, auto)
>     if (options == None):
>@@ -435,12 +431,12 @@ def main():
>         sys.exit(1)
> 
>     logParent = os.path.dirname(options.remoteLogFile)
>-    dm.mkDir(logParent);
>+    dm.mkDir(logParent)
>     auto.setRemoteLog(options.remoteLogFile)
>     auto.setServerInfo(options.webServer, options.httpPort, options.sslPort)
>-
>     retVal = 1
>     try:
>+        mochitest.cleanup(None, options)
>         retVal = mochitest.runTests(options)
>     except:
>         print "TEST-UNEXPECTED-FAIL | %s | Exception caught while running tests." % sys.exc_info()[1]
Attachment #649466 - Attachment is patch: true
Comment on attachment 649466 [details] [diff] [review]
patch with above fixes

Looks good!
Attachment #649466 - Flags: review?(jgriffin) → review+
David, I've updated the patch to make it run more smoothly on the emulator; can you test it on the device to make sure I didn't break anything there?
Attachment #649511 - Flags: review?
Attachment #649511 - Flags: review? → review?(dclarke)
/system/bin/reboot was because if busybox is installed the reboot command doesn't reboot anymore. it requires reboot -f. 

I think it might make sense to be more clear about which command is being run. 

For me busybox is very helpful on device. Does /system/bin/reboot not work on emulator ?
 
+    def copyRemoteFile(self, src, dest):
+        if self._dm_useDDCopy:
+            self._dm.checkCmdAs(['shell', 'dd', 'if=%s' % src,'of=%s' % dest])
+        else:
+            self._dm.checkCmdAs(['shell', 'cp', src, dest])
+
looks like a typo here

_dm.useDDCopy
(In reply to dclarke@mozilla.com  [:onecyrenus] from comment #8)
> /system/bin/reboot was because if busybox is installed the reboot command
> doesn't reboot anymore. it requires reboot -f. 
> 
> I think it might make sense to be more clear about which command is being
> run. 
> 
> For me busybox is very helpful on device. Does /system/bin/reboot not work
> on emulator ?

Device Manager is used by Fennec as well, and I can't say whether /system/bin/reboot works on all the devices we test.  At any rate, I didn't see any call sites for devicemanager.reboot(); it looks like you had changed them all for B2G to explicitly call "shell /system/bin/reboot".
(In reply to dclarke@mozilla.com  [:onecyrenus] from comment #9)
>  
> +    def copyRemoteFile(self, src, dest):
> +        if self._dm_useDDCopy:
> +            self._dm.checkCmdAs(['shell', 'dd', 'if=%s' % src,'of=%s' %
> dest])
> +        else:
> +            self._dm.checkCmdAs(['shell', 'cp', src, dest])
> +
> looks like a typo here
> 
> _dm.useDDCopy

Yep, thanks for finding that!
Whiteboard: u=b2g c=ateam p=1
Attachment #649511 - Flags: review?(dclarke)
Made some additional fixes per our conversation on IRC, verified on both device and emulator, and landed as http://hg.mozilla.org/mozilla-central/rev/4a4eaeca4dfc.

The only outstanding issue is the networking problem on the otoro, which is not related to this patch (it occurs without it as well).  I'm closing this bug and new bugs should be filed for additional problems.
Assignee: nobody → jgriffin
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: