Open Bug 524130 Opened 10 years ago Updated 4 years ago

running tests from packaged builds should be easier

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(Not tracked)

People

(Reporter: ted, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently, our buildbot configs have to specify a lot of options to run test suites on a packaged build. We should make this easier, by making the harnesses know the test package structure, and defaulting to the right options in that case, or by packaging up a Makefile in the test package that has the same targets as testsuite-targets.mk, so you can just "make mochitest-plain" or "make reftest" etc.

catlee suggests doing both would be even better, since the Makefile gets us consistency, and making the harness scripts smarter makes it easier for testing and development anyway.
Adding jmaher to this bug as he's talking about doing something very similar with his wrapper code for Fennec.
taking mochitest for example, I have to run this on mobile:
cp fennec.*.bz2 /tests
cp xulrunner.*tests.tar.bz2 /tests
cd /tests
<unzip/tar *.bz2)
cd mochitest
python runtests.py --appname=../fennec/fennec --xre-path=../fennec/xulrunner --certificate-path=../certs --utility-path=../bin --autorun --close-when-done --log-file=mochitest.log

So what I am proposing is agree on the 'standard' way to unpack these and have the scripts look for relative binaries and files in the 'standard' location.  

If that is the default, then I could just run 'python runtests.py' and come back a few hours later and see a mochitest.log file waiting for me.  I would also expect that it look for the directory structure to live inside an $(objdir) environment and be able to use the standard values that are currently used for 'make mochitest'.  

I understand for mobile things are a bit more complicated since we have the xulrunner and mobile applications.  Also for mobile we don't have 'make' available to use in many environments, so I would vote for not going a Makefile route.
I think I'd like to do both, since that will make our buildbot scripts able to do the same thing in the packaged or unpackaged cases, and also simplify testing and the mobile situation. It will be tricky to get the mobile case right, but we can probably make it a lot simpler.
Duplicate of this bug: 485182
Things like the trace-malloc tests should also be simpler to run (bug 525234).
Depends on: 525234
I would say that's unrelated, as we don't currently run them as part of our test suites.
No longer depends on: 525234
Copying (or creating) Makefiles that let testers download builds and tests and then execute them would help a lot with our graphics test outsourcing.

I'm CC:ing Matt Evans, who is very interested in getting our testing contractors better able to run our actual test suite.
Hello A-team, I would like to raise the priority on this. Can we get this on your radar for Q4? Really appreciate it.
what are the scenarios that are the most important here?
* desktop?
* mobile?
* makefiles as :joedrew suggested?
* shell scripts?
The most important scenario is for desktop builds. I suggested Makefiles because we already have those built, but it's not really important how it happens.
Attached file A script to run things right now. (obsolete) —
Matt & Joe,

Here is a simple script that does what you want.  I agree with you that this needs to be simplified and made easier. However, with the user responsiveness testing, greening and expansion of our mobile support, and b2g projects I do not think this is a priority 1 issue for 2011 Q4. 

I'd prefer we focus on making running mobile tests on device easier.  I think that is far higher priority than this bug for Q4.  One of the first tasks in this area is Joel's push to make running remote talos easier which is a huge PIA to set up by hand.  You can check out that work in bug 688604.

If we have time to work on this issue this quarter, then we will, but I really want to concentrate on making mobile easier to run this quarter as running on the desktop browser is quite easy to script.
I don't think this is a huge amount of work if we just want to make it work for desktop builds. I could probably get it done in a day or two. Making it work for mobile testing would be trickier, but with all the work jmaher and others have done it's probably not significantly harder.
A-team, thanks for the help. As Joe mentioned, at the moment we are only concerned about desktop builds. The script from Clint is certainly a start. Although it's not just gfx tests we want to run on the platform machines it's all the regression tests on each supported OS. The outsourced company is reasonably technical, so this doesn't have to be fancy. However, the idea is to be able to download the builds and run all regression tests for each channel. At the end, the test runners should be able to easily gather and report pass/fail results. So, if this is present in a log file that may work.
This bug was originally filed for folks who don't have a build tree and just download a binary and tests package from the ftp server and who have no other required tools other than python (not even make).  This was intended to reduce the number of command line variables required to run (i.e. python runtests.py would default to running all)

The script that Clint attached will do more than that and download everything from the ftp site, unpack it and run it.  With that script they would only need to modify these variables as we change versions:
FIREFOX_BINARY_PACKAGE_NAME=firefox-9.0a1.en-US.linux-i686.tar.bz2
FIREFOX_TEST_PACKAGE_NAME=firefox-9.0a1.en-US.linux-i686.tests.zip


Some caveats:
* while running these tests a human or animal cannot touch the machine in any way shape or form, otherwise focus or mouse movements will affect the results of the tests
* I recommend running in a VM instead of the host OS so they can use the machine
* the machine needs to be setup to not sleep or have a screen saver
* the machine needs to have a high enough screen resolution (1980x1200, or 1600x1200)
* the script should be modified to put data into log files which they can do a quick tail on each log file and see the summary at the end
* the machines they run on need ample cpu and memory
* on my ubuntu 11.04 desktop I cannot get all the tests to pass (one issue is my dns settings from my internet provider)

Lastly, we already run all the tests on all the builds via our automation infrastructure.  Maybe they don't need to do this?  

Joe, would the attached script help?  Maybe if it was broken down into sub scripts after the tests are downloaded?
Thanks Joel. I believe the script and your info is probably sufficient for now. Question: We should be able to do this with appropriate script modifications on win and osx platforms as well correct?

The main focus is to run regression tests on a set of reference platforms (https://docs.google.com/spreadsheet/ccc?key=0Aj-5ZZLSQ6mrdHZOSTl4ZE1tNkFfeThHUlNPckRrRHc&hl=en_US#gid=1) to determine if regression failures/issues can be attributed to the configuration of the platform. So the high level idea here is to augment our build automation to catch those errors that may be attributed to platform hardware and software configuration incompatibilities. The platforms used for these tests would not be in the set of machines contained in our build automation infrastructure or at least the configuration would be markedly different.

Having said that, this may not be the best methodology to achieve the goal of catching compatibility regressions. Or at least running all automation tests on all reference platforms and parsing the results for what would be an overall pass/fail of the suite may be too costly in this context. Maybe there is a good subset (smoketest) of the automation tests that are robust by nature, yet would catch most incompatibility regressions? I would welcome any advice in this regard. I think the next step would be to try it with the suggested script modifications to record the results.

We can determine later if the test run results are too noisy to provide clear indication of compatibility issues.
Matt, I believe on windows we run the tests inside the mozilla build package:
http://ftp.mozilla.org/pub/mozilla.org/mozilla/libraries/win32/MozillaBuildSetup-Latest.exe

This gives a unix like shell and python binaries.  For OSX, the script should work fine.  Actually I will upload a new one shortly that writes to a log file and at the end will tail the log files.  

One question I have is who is going to work on debugging the failed tests?
this script redirects to a log file, then tails the last 5 lines of the log file.

Keep in mind in the script that the end user should edit the test path.  There are ample comments in the script that outline how to run all tests.

this depends on wget and python being installed.
Attachment #562104 - Attachment is obsolete: true
That script (comment 17) looks good.  We can easily supply a script to run everything (not just gfx).  If we run it from the windows-build msys environment the script should work (with the defines at the top properly changed).  I think that if you're going to outsource this we should probably provide you with three scripts - one for each OS.  In my experience, you want to keep things that are outsourced pretty simple -- there's always a lot of room for errors in communication.

Would these scripts do what you want, Matt?


== Unrelated concerns ==
This might be better to take offline into another bug as it's not related here, but I wanted to voice it anyway.
In general, I think this is an ambitious and great idea.  But I'm concerned on how we'll make actionable data from the information we'll be getting.

(In reply to Joel Maher (:jmaher) from comment #16)
> 
> One question I have is who is going to work on debugging the failed tests?
I want to really emphasize this.  If we get failures on these configurations, that's awesome information, but how are the developers going to reproduce it and fix it?

How will we cross-reference the data of failures we get from these machines with the data in orange factor so that we can filter out known intermittent failures?  Because we are certain to get intermittent failures. FWIW, I've *never* had a green run running on my own desktop hardware when I used to run the tests from time to time.  I'd often forget the machine was busy and I'd move the mouse, or worse, I'd click on something, and a test would fail.  We really need to underscore that there can be no interaction with these machines while they are testing.
Depends on: 1058923
You need to log in before you can comment on or make changes to this bug.