Develop smoketest to burn in mobile devices

RESOLVED FIXED

Status

Testing
General
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: wlach, Assigned: Callek)

Tracking

Trunk
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 5 obsolete attachments)

For bug 776728, it would be desirable to have a smoke test we could use to make sure the pandas are running and reliable. 

There's lots of things that we could do in a smoke test, but I think to start we can just look for this:

1. The panda can start fennec
2. We can reboot repeatedly

It turns out that autophone already has a basic smoke test which we should be able to adapt for our purposes, given enough work. This bug is about tracking that.
Let's create a bug for tracking the work to create an autophone-based burnin test, and make it block 776728 (perform acceptance test of panda chassis configuration). This will make it easier for us to track patches, etc. without cluttering up that bug (which also encompasses work that IT needs to do, as far as I can tell)
Blocks: 776728
OS: Linux → Android
Hardware: x86_64 → ARM
Ok, so I got something working on Friday with my LG, but I'm totally failing to get it working with my Panda.

I've created a special branch of autophone with my current changes needed to run this test here:

https://github.com/wlach/autophone/tree/781341

Installation instructions:

* Check out git://github.com/wlach/autophone.git
* Run `git checkout remotes/origin/781341`
* Create a virtualenv, activate it. Inside it, run `pip install mozdevice manifestdestiny pulsebuildmonitor mozprofile`
* Get the ip address of the panda you want to test with. Run `python multiPublish.py <ip address>`
* Run `python cigar.py`

This set of steps works fine with my phone, but totally fails with my panda. It seems like it wants to contact the panda immediately after the reboot, but is failing to do (because it takes a long time to reboot) and then throws a hissy fit. I'll attach a log of the failing output.
Created attachment 650332 [details]
Patch of autophone failing after attempting to reboot panda

Any idea why it's trying to run recover_phone immediately after a reboot was scheduled? This makes very little sense to me -- I don't see this behaviour at all with my physical phone.
you also need to:
virtualenv .
. bin/activate
pip install mozdevice manifestdestiny pulsebuildmonitor mozprofile
pip install mozhttpd


Even doing all of this on my panda it fails although it looks like it is running.  Poor panda :(
(In reply to William Lachance (:wlach) from comment #2)
> Ok, so I got something working on Friday with my LG, but I'm totally failing
> to get it working with my Panda.
> 
> I've created a special branch of autophone with my current changes needed to
> run this test here:
> 
> https://github.com/wlach/autophone/tree/781341
> 
> Installation instructions:
> 
> * Check out git://github.com/wlach/autophone.git
> * Run `git checkout remotes/origin/781341`
> * Create a virtualenv, activate it. Inside it, run `pip install mozdevice
> manifestdestiny pulsebuildmonitor mozprofile`
> * Get the ip address of the panda you want to test with. Run `python
> multiPublish.py <ip address>`
> * Run `python cigar.py`

These instructions missed a dep in the pypi install stage. The correct instructions are:

* Check out git://github.com/wlach/autophone.git
* Run `git checkout remotes/origin/781341`
* Create a virtualenv, activate it. Inside it, run `pip install mozdevice manifestdestiny pulsebuildmonitor mozprofile mozhttpd`
* Get the ip address of the panda you want to test with. Run `python multiPublish.py <ip address>`
* Run `python cigar.py`

Comment 6

6 years ago
Created attachment 653354 [details] [diff] [review]
A new patch

This patch takes Will's work a step farther. It was working on pandas, but there were connection issues in Boston, so I couldn't tell if it was working completely. But I did verify that I was properly restarting the panda, registering it, and launching the browser. 

This patch optimizes Will's work so that we can have one script to run that will burn in a whole bunch of pandas/phones.  The idea here is that you would call cigar.py with a file that contains one IP address per line and the number of times you want to run.  I have a wrapper that then takes the code in this patch and provies one shell script to run which will set up everything, run the smoke test, and remove everything after it's done (everything largely being the virtualenv).  

I think that having a single script to run here, with only one step will help us get a lot closer to our goal of having a simple burn in script that is as simple as possible to run.

Will, let me know what you think.
Attachment #653354 - Flags: feedback?(wlachance)
Comment on attachment 653354 [details] [diff] [review]
A new patch

I approve of the general ideas behind this patch.

diff --git a/tests/smoketest.py b/tests/smoketest.py
index 1a98cd6..1d7b013 100644
--- a/tests/smoketest.py
+++ b/tests/smoketest.py
@@ -34,7 +34,10 @@ class SmokeTest(PhoneTest):
 
         # Run test
         self.logger.debug('running fennec')
-        self.run_fennec_with_profile(intent, 'about:fennec')
+        #self.run_fennec_with_profile(intent, 'about:fennec')
+        # This works better on panda to use sut's exec rather than the intent
+        # to launch fennec
+        self.run_fennec_with_sut(job['androidprocname'], 'about:fennec')

Could we debug this before landing? 

I'd also like to see if we can't merge smoketest.py and cigar.py together (doesn't really have anything to do with your patch/idea, just something I'd like to see).
Attachment #653354 - Flags: feedback?(wlachance) → feedback+
What are the next steps on this bug and who owns them?  We really need a smoketest up and going both for the tegras and the pandas.
Created attachment 658283 [details] [diff] [review]
use the sut_tools for a smoketest (1.0)

this is a patch to sut_tools which updates the entire sut_tools library to the newest devicemanager*.py as well as adding smoketest.py.

* There are changes to add logging throughout the code.
* There are changes to make the existing python scripts more modules
* There are cleanups in the code to remove unnecessary imports and statements
* There is use of the logging module to log to a file and to stdout/console
* Smoketest.py utilizes the existing tools as much as possible for verify, cleanup, uninstall, install
* This assumes we have all tests/binaries from: http://people.mozilla.org/~jmaher/smoketest/smoketest.zip in the same directory

This patch is intended to be checked into the sut_tools library so we can upstream these changes to work on panda boards as well.  To get a burnin test going now, you can take the http://people.mozilla.org/~jmaher/smoketest/smoketest.zip file, unzip it and run python smoketest.py.  

I have ran this about 20 times today while cleaning up some of the logging as recommended by Callek last Friday via IRC.
Attachment #658283 - Flags: review?(bugspam.Callek)
(Assignee)

Comment 10

6 years ago
Comment on attachment 658283 [details] [diff] [review]
use the sut_tools for a smoketest (1.0)

Review of attachment 658283 [details] [diff] [review]:
-----------------------------------------------------------------

The file fails to apply cleanly, but I'm happy with the general approach, the way we do logging (and correlate it back and forth with clientproxy as well) still makes me cringe, but this is still an overall improvement except in the specific spots I mentioned.

Holding off on a real r+ until I get a chance to test on one of our staging systems, but this currently has conflicts. But its looks pretty good overall.

::: sut_tools/cleanup.py
@@ +85,2 @@
>  
> +    if doCheckStalled:

How about we split the "doCheckStalled" stuff out to another function while we're here? to make running this seperate from the device-side cleanup easier in the future as well.

::: sut_tools/installApp.py
@@ +5,5 @@
>  import devicemanagerSUT as devicemanager
>  
>  from sut_lib import getOurIP, calculatePort, clearFlag, setFlag, checkDeviceRoot, \
>                      getDeviceTimestamp, setDeviceTimestamp, \
> +                    getResolution, waitForDevice, runCommand, log, getSUTLogger

why import log *and* getSUTLogger ?  We could/should just do a global log = getSUTLogger, no need to wrap it into if __main__....

::: sut_tools/smoketest.py
@@ +5,5 @@
> +# You can obtain one at http://mozilla.org/MPL/2.0/.
> +import sys, os
> +import devicemanagerSUT as devicemanager
> +import subprocess, re
> +from sut_lib import pingTegra, connect, log, getSUTLogger

same comment about log/getSUTLogger

@@ +50,5 @@
> +                return False
> +            return True
> +    return False
> +
> +def smoketest(tegra_name, number):

Since this is a new script, can we stay away from |tegra|* in var naming for stuff that can be used for other types of boards too?

@@ +57,5 @@
> +    dm = None
> +    proxyFile = None
> +    errorFile = None
> +
> +    dm = devicemanager.DeviceManagerSUT(tegra_name, 20701)

in this case, setting dm, and doing nothing with it before we call devicemanager to set it feels like wasted time/confusion. we can drop the dm = None here.

@@ +66,5 @@
> +        log.error("failed to run verify on %s" % (tegra_name))
> +        return 1 # Not ok to proceed
> +    log.info("Successfully verified the device")
> +
> +    if not uninstallApp(processName):

verifies cleanup.py should uninstall as well, is it necessary here?

@@ +84,5 @@
> +    return 0
> +
> +if __name__ == '__main__':
> +    global appFileName, processName
> +    appFileName = "fennec-18.0a1.multi.android-arm.apk"

nit we'd want this to be a passed-in option in final, or to smartly pull from the -latest dir, or both. I'd prefer to block final landing of this patch on that fix.

If that is not easy with how this is coming together, I'd settle for a CONST listed at top of this file to be specified, if persuaded.

@@ +98,5 @@
> +            print "INFO: Using tegra '%s' found in env variable" % tegra_name
> +    else:
> +        tegra_name = sys.argv[1]
> +    
> +    num = re.compile('.*([0-9][0-9])')

nit this won't match 3-digit numbers, perhaps re.compile('.*([0-9]+)')

::: sut_tools/sut_lib.py
@@ +23,5 @@
>  import json
>  
> +def getSUTLogger(filename=None, loggername=None):
> +    if not filename:
> +        filename = "sut_tools.log"

I'd rather not log to sut_tools in every invocation that uses a sut_lib log. (like all the buildbot steps, clientproxy, etc.)

@@ +28,3 @@
>  
> +    if not loggername:
> +        loggername = "SUT_LIB"

Nit: loggername = __name__ here

@@ +31,3 @@
>  
> +    logger = logging.getLogger('')
> +    logger.handlers = []

Why are we adding a blank logger here?

@@ +44,5 @@
> +    console.setLevel(logging.INFO)
> +    root = logging.getLogger('')
> +    root.addHandler(console)
> +
> +    return logging.getLogger(loggername)

See http://mxr.mozilla.org/build/source/tools/sut_tools/clientproxy.py#74 for the way we init the loggers and handlers in clientproxy for an Echo+File logging to get a better head around how I'd like this function to be.

I'm also not sure its going to cooperate well with the clientproxy logging, but lets fix this up then we can test this better.

::: sut_tools/updateSUT.py
@@ +132,4 @@
>      if tegra_name in (None, ''):
>          raise ImportError("To use updateSUT.py non-standalone you need SUT_NAME defined in environment")
>      else:
> +        print "DEBUG: updateSUT: Using tegra '%s' found in env variable" % tegra_name

should this (file) convert to use the log?

::: sut_tools/verify.py
@@ +7,4 @@
>  import sys
>  import os
>  import time
> +from sut_lib import pingTegra, setFlag, connect, log, getSUTLogger

Same comment as earlier with log/getSUTLogger

@@ +167,3 @@
>              dm.removeFile("/mnt/sdcard/writetest")
> +        log.info("INFO: attempting to create file /mnt/sdcard/writetest")
> +        if not dm.pushFile(os.path.abspath("verify.py"), "/mnt/sdcard/writetest"):

Does this work when we're not cwd as /builds/sut_tools?

@@ +212,5 @@
>      def watcherDataCurrent():
>          remoteFileHash = dm.getRemoteHash(realLoc)
>          if currentHash != remoteFileHash:
> +            log.info(dm.pullFile(realLoc))
> +            log.info(watcherINI)

leftover debugging?
Attachment #658283 - Flags: feedback+
Created attachment 660088 [details] [diff] [review]
use the sut_tools for a smoketest (2.0)

updated patch to account for almost all review feedback except:
* apk name: we will have a static apk and tests package.  This is by design
* use of the name sut_tools.log
* use of the logging.getLogger('') <- this is actually useful as we attach the console logger to the root and the logger '' is the root
* mimicking the clientproxy logging init

I would like to figure out a good solution for the logging init and stuff so we can r+ and land :)

This patch should apply cleanly.
Attachment #658283 - Attachment is obsolete: true
Attachment #658283 - Flags: review?(bugspam.Callek)
Attachment #660088 - Flags: review?(bugspam.Callek)

Updated

6 years ago
Blocks: 790595
Comment on attachment 660088 [details] [diff] [review]
use the sut_tools for a smoketest (2.0)

I guess this makes my patch in bug 790613 obsolete.

Please may you make sure any errors are prefixed with "Automation Error:", "Remote Device Error:" or else any other prefix that we can add to TBPL :-)
(In reply to Ed Morley [:edmorley UTC+1] from comment #12)
> Please may you make sure any errors are prefixed with "Automation Error:",
> "Remote Device Error:" or else any other prefix that we can add to TBPL :-)

That wasn't very clear, sorry.

What I meant to say:
* Is there a rough idea for timeframe of this? Should I land 790613 once it gets review anyway, or it going to make work harder here / this will be landing soon anyway?

* The patch changes a number of failures to use log.error, can we get those to use the new strings and I'll deal with any leftovers in bug 790613 and bug 790595.
Blocks: 790613
Created attachment 661284 [details] [diff] [review]
use the sut_tools for a smoketest (3.0)

oh, the latest version
Attachment #660088 - Attachment is obsolete: true
Attachment #660088 - Flags: review?(bugspam.Callek)
Attachment #661284 - Flags: feedback?(bugspam.Callek)
(Assignee)

Comment 15

6 years ago
Comment on attachment 661284 [details] [diff] [review]
use the sut_tools for a smoketest (3.0)

Review of attachment 661284 [details] [diff] [review]:
-----------------------------------------------------------------

so on a feedback-level skim, looks pretty good. A few things I know would block my review, and a few things I feel would HELP my review [but not blocking it]

So, Blocking review, (comments below) and if we're switching devicemanager*.py version here like this, lets switch all of sut_tools while we are at it, or at the VERY least, anything and everything that uses one of these files you're switching (e.g. clientproxy.py)

For "would help me":
* A bit more context, lets say -U8 instead of -U4.
* If you could split the changes into parts, it would assist my review, but is not a hard requirement
** My theory on great splitting: 
*** Update to mozdevice
*** Changing to use logging everywhere
*** Changes to add/support the smoketest script+pandas.

::: sut_tools/cleanup.py
@@ +40,5 @@
>          dm = devicemanager.DeviceManagerSUT(tegra)
>          dm.debug = 5
>  
> +    outputfile = StringIO.StringIO()
> +    dm._doCmds([{'cmd': 'exec pm list packages'}], outputfile, None)

I thought I remembered you saying there was a problem with this.

And actually, would execsu work here instead (as in, does it make more sense anyway)

::: sut_tools/installApp.py
@@ +140,4 @@
>  
>          width, height = getResolution(dm)
>          #adjust resolution down to allow fennec to install without memory issues
> +        if (width == 1600 or height == 1200):

specific reason for changing this check?

@@ +178,5 @@
>      robocop_to_use = find_robocop()
>      if robocop_to_use is not None:
>          waitForDevice(dm)
> +        if installOneApp(dm, devRoot, robocop_to_use, proxyFile, errorFile):
> +            sys.exit(1)

Does this mean we would error if we fail to install roboCop? Should we?

::: sut_tools/smoketest.py
@@ +58,5 @@
> +    return False
> +
> +def smoketest(device_name, number):
> +    global dm
> +    global appFileName, processName

nit: anything we are globaling, can you set/define them at the top of the file, so they are known prior to them being visible/accessed here.

::: sut_tools/sut_lib.py
@@ +23,5 @@
>  import json
>  
> +def getSUTLogger(filename=None, loggername=None):
> +    if not filename:
> +        filename = "sut_tools.log"

This still doesn't give a solution to have _no_ file level logging, (as in stuff is run in buildbot)

@@ +474,5 @@
> +        if dm._sock:
> +            dm._sock.close()
> +            dm._sock = None
> +        dm.deviceRoot = None
> +        dm.retries = 0

Going purely by context for the moment, I'd want at least some sort of delay between tries.
Attachment #661284 - Flags: feedback?(bugspam.Callek) → feedback+

Updated

6 years ago
No longer blocks: 790613
Depends on: 795028
Depends on: 795071
Depends on: 795074
Created attachment 666673 [details] [diff] [review]
add smoketest to the sut_tools (4.0)

this is a patch that is probably not ready for review, but I would love to get some general feedback on it.  This adds a smoketest along with changing all the reboot logic to use the PDU instead of dm.reboot().

I haven't really gone through this patch in detail, so there could be ugly hacks or nasty debugging lines in here.  This depends on the 3 patches we have from earlier.
Assignee: wlachance → jmaher
Attachment #661284 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #666673 - Flags: feedback?(bugspam.Callek)
Depends on: 797868
Created attachment 667984 [details] [diff] [review]
add smoketest to the sut_tools (5.0)

updated patch to work with pdu reboot
Attachment #666673 - Attachment is obsolete: true
Attachment #666673 - Flags: feedback?(bugspam.Callek)
Attachment #667984 - Flags: feedback?(bugspam.Callek)
Summary: Need autophone test to burn in a set of machines → Develop smoketest to burn in mobile devices

Updated

6 years ago
No longer blocks: 790595
(Assignee)

Comment 18

5 years ago
Comment on attachment 667984 [details] [diff] [review]
add smoketest to the sut_tools (5.0)

Review of attachment 667984 [details] [diff] [review]:
-----------------------------------------------------------------

Adjusted for bitrot....
Attachment #667984 - Flags: feedback?(bugspam.Callek) → feedback-
(Assignee)

Comment 19

5 years ago
Created attachment 675721 [details] [diff] [review]
[tools] v1.2

So Joel, While this patch is almost exactly yours, especially the smoketest.py I'm confused on if/when the smoketest.py here would ever be used again. Whom would use it, and how.

Since this script has a lot hardcoded, and will need to be documented/fixed up, as well as us understanding when it will run, I'd like to evac sut_tools of stuff that is obsolete/not needed into the future.
Assignee: jmaher → bugspam.Callek
Attachment #667984 - Attachment is obsolete: true
Attachment #675721 - Flags: review?(kmoir)
Attachment #675721 - Flags: review?(jmaher)
Comment on attachment 675721 [details] [diff] [review]
[tools] v1.2

Review of attachment 675721 [details] [diff] [review]:
-----------------------------------------------------------------

this looks good.  We really should all agree on how we want to run the smoketests, where the related bits live and how we want to report on them.  What I have is something that worked for me, but I can see how it will be confusing to others.
Attachment #675721 - Flags: review?(jmaher) → review+

Updated

5 years ago
Attachment #675721 - Flags: review?(kmoir) → review+
(Assignee)

Comment 21

5 years ago
Landed with http://hg.mozilla.org/build/tools/rev/56d2168d7a07
And backed out part of it with http://hg.mozilla.org/build/tools/rev/400a4f4958d8

Calling this bug (as written) done. We have followup(s) on handling uninstall better
(Assignee)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.