Closed Bug 776728 Opened 12 years ago Closed 11 years ago

perform acceptance test of panda chassis configuration

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

ARM
Android
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Unassigned)

References

Details

(Whiteboard: [reit-panda] u=panda c=releng p=2)

+++ This bug was initially created as a clone of Bug #769429 +++

(In reply to Clint Talbert ( :ctalbert ) from bug 769429 comment #10)
> Great, Armen correct me if I'm wrong, but our next step should be to get the
> agent on these and try running tests?

Yes - we need to simulate production usage of this chassis. At a minimum (ideally), that means:
 a) impose max load on all devices simultaneously until temp stabilizes (demonstrate thermal & power supply capacities)
 b) test all impact of turning on RF devices on adjacent boards (demonstrate isolation of each board)
 c) demonstrate control of power to each Panda board via SNMP

I'm hoping some of the existing tests be re-purposed to help with (a) & (b)
First off, thanks for filing this bug.  This is what we need!

These devices won't be using wifi, only ethernet, and should just have an sdcard (os+data) and power supply.  That should minimize the interfaces we access.

I would say lets build a suite using sut_tools (maybe mozharness) that does a simple smoke test (a really small test suite that will pass- i.e. a 5 minute mochitest dom-level-1 run), and put this in a loop with reboots, uninstall, install.

we have mozharness for talos that could be configured to run a smaller version of the test, or we could get sut_tools and a foopy lined up.  Either way we need a machine to control this.

In doing this, I would recommend about two days of monitoring (probably a lot of hands on fiddling) until we have it working in a continuous loop on 1 board.  Then a day of close monitoring while we scale to 10 boards.  I say a full weekend (60+ hours) of non stop automation with no failures would be ideal.

This should be a good starting point, lets get agreement on what we need to test first before writing custom scripts for this.
Hal, who from releng or ATeam is on point for writing these tests (Joel gave some good recommendations)?
Sign me up for coordinating or doing the script writing.  There are a few ateam members who are working on panda boards, so it might be a split task or handed off if my skills are deemed too in need or not good enough.
Depends on: 776977
Depends on: 776987
This sounds good. Both Joel and Will can help write the scripts to ensure the pandas are working.  We will need releng support to bring up a foopy to control the chassis of 10 pandas.
(In reply to Joel Maher (:jmaher) from comment #1)
> These devices won't be using wifi, only ethernet, and should just have an
> sdcard (os+data) and power supply.  That should minimize the interfaces we
> access.

Just asking based on being burned elsewhere - is wifi disabled on board or in firmware, or it's just not convention to write a test that turns wifi/phone/bluetooth on? (Will the same also hold for b2g?)

I don't believe we need to take any action, necessarily, but would like to have radio usage documented as part of our spec for the chassis.
We haven't explicitly disabled the wifi, we're just not activating it. If necessary, we can look into disabling it more thoroughly.
Depends on: 777370
(In reply to Ted Mielczarek [:ted] from comment #6)
> We haven't explicitly disabled the wifi, we're just not activating it. If
> necessary, we can look into disabling it more thoroughly.

Good enough for now - the documentation is key - thanks!

Can you confirm same holds for phone/bluetooth (or that hardware doesn't exist on panda boards)?
Sorry for the noise again!
No longer blocks: android_4.0_testing
I am looking at writing such a script with mozharness, currently trying to unwind how it works. It looks like scripts/device_talosrunner.py might be just the ticket to writing a simple smoketest. Will do more looking into it tomorrow!
I just barely managed to get talos's ts running against my phone using mozharness today, though I found some minor issues: bug 779983, bug 779980, bug 779979. There is still the issue of parsing/using the results. 

To be honest I am not sure if Talos/MozHarness are the ideal tools to use here as it can be very difficult to see what's going on if things go wrong, given the present state of things. It does seem to have some nice features, but I think it might need some more time to bake (or for me to just understand it better).

I am strongly thinking of just writing a small utility tomorrow to just do the equivalent of Talos (start the browser repeatedly and make sure it's working). I'll sleep on it.
Good writeup Will.  I found that in talos the getInfo.html is all that works on the panda right now.  I have gone back a whole month in builds with no luck in changing the results.  Also reftests work, but that seems a bit too complex for what we are looking to do.

Maybe a stripped down getInfo.html from talos will be all we need!
So I found that the smoketest in the autophone framework already did *almost* exactly what we wanted to do here, so I decided to piggy-back on top of it by enhancing it support running it an arbitrary number of times and searching for failures. Basically it just reboots the machine and starts Fennec repeatedly, which is really all we want to test. Yay for not writing yet another piece of code with its own deployment story.

Anyway, I'm going to be gone on Monday, but I'll try to have something that people can pull and use (along with appropriate instructions) sometime Tuesday.
Whiteboard: [reit-panda] → [reit-panda] u=panda c=releng p=2
Depends on: 781341
Filed bug 781341 to explicitly track the smoketest.
this is done.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.