Closed Bug 914632 Opened 11 years ago Closed 8 years ago

[Tracking] Stand up Gaia Try infrastructure

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cmtalbert, Unassigned)

References

Details

There are two basic methodologies we want to support.
1. For situations where you need to test a change to both gecko and gaia: From gecko we want to edit gaia.json to point to a specific gaia commit, and run it on our normal hg try (this will build gecko b2g desktop builds and then run gaia tests against those builds.
2. For the situation where you need to test gaia with an existing gecko (for instance debugging why a gaia test fails on tbpl) - we need a gaia-try hg repo which will contain a JSON file that will point us to the gecko build we want to use and the commit of gaia that we want to run. The buildbot automation will use the referenced gecko build to schedule gaia tests and this repo will constitute gaia-try. 
2b. Eventually we would like to have a web form to abstract the fact that a gaia developer would have to push to hg in step 2.

For both of these scenarios we'll need to the ability to pull down code from github in order to reference the gaia commits. But in both cases we're only pulling down code to use during try testing, none of the code we pull down this way will be shipping in any configuration.

This bug tracks the work for setting up this automation.
Depends on: 914627
Depends on: 914638
Depends on: 914640
Depends on: 914651
joduinn has asked for a few more details about this in order to scope and prioritize this work properly, so here goes:

Use cases:
1 - The primary use case is to allow developers to debug gaia-specific failures in TBPL.
2 - A secondary use case is to allow pre-testing of gaia changes, or coupled gecko-gaia changes, before landing.  Normally this is done using Travis, but there may be cases where developers would want to do this on try as well, such as when past changes to this code have broken tests in TBPL but not Travis.

Builds:
Initially, we will only need b2g desktop builds, since that's the only build type we're running gaia tests against.  Later, we'll likely want emulator builds and tests as well, since we intend to stand up gaia tests on emulators.  We will never need device builds for this.

How we're going to support this:

== Gecko and Gaia changes ==

Once catlee's work in bug 899969 is done (for specifying all repos in a manifest instead of just gaia), all that will be needed is to allow the manifest to specify an arbitrary github branch for the gaia commit, and have that honored by the build and test jobs.  Then, builds and tests will be scheduled as normal.  We already have security permission for github access, bug 914651.

== Gaia changes only ==

To support very fast turnaround time for testing gaia changes only, we want to support running tests against existing b2g desktop/emulator builds.  To this end, we will create a gaia-try repo in hg, which will contain only a manifest that specifies an arbitrary gaia commit (potentially on github), and a build identifier (possibly a url).

When someone pushes to this repo, no build jobs will be scheduled; instead, test jobs will be scheduled (bug 914640), and the build url and gaia commit from the manifest will be passed to the test jobs in build_props.  The test jobs will download the specified build, clone the specified gaia, and run the tests.

B2G desktop test jobs already know how to clone gaia and use that to run tests, so they'd need only slight tweaking to support arbitrary github branches.  Emulator test jobs do not know how to use custom gaia commits, so would have to be updated to do this.

James, Chris, does this sound correct from both your perspectives?
Flags: needinfo?(jlal)
Flags: needinfo?(catlee)
Yes- this is a correct summary of our discussions during Oslo (and before)
Flags: needinfo?(jlal)
Per comments in today's b2g cross functional meeting, this bug is ready for scoping and prioritization by rel-eng.
Flags: needinfo?(catlee) → needinfo?(joduinn)
Now that we have gaia's integration test suite running on TBPL it is critically important for 

> 1 - The primary use case is to allow developers to debug gaia-specific failures in TBPL.

that we fix this. There's a big push in gaia right now to get our test suite stabilized on TBPL, but we can't move forward until this gets resolved.
Flags: needinfo?(jgriffin)
Flags: needinfo?(catlee)
Flags: needinfo?(anygregor)
Whats the current status? Are we blocked on rel-eng here?
Flags: needinfo?(anygregor)
Yes, this is largely a rel-eng task.

There is a group of us in a the a-team/rel-eng/gaia that are starting work on a system that would allow gaia pull requests to trigger test jobs.  This might be the optimal path to implementing gaia tryserver, but it's also a longer one; we probably wouldn't be able to build out gaia tryserver support until Q2 using this work.

We could probably implement gaia tryserver support faster using the plan detailed in comment #2, but it may come at the expense of the above project, depending on how many resources rel-eng can marshal for these.

I think at the first B2G Engineering Meeting in the new year, we should invite a couple of extra people (rel-eng, Gareth, and other interested folks) and figure out what we want to do here.
Flags: needinfo?(jgriffin)
We already have a case in https://bugzilla.mozilla.org/show_bug.cgi?id=944697 where a test is only failing consistently on TBPL and I am not sure what advice to give the test writer :(.
The current plan of record is to rely on TaskCluster to implement support for gaia-try.  TaskCluster is a new (buildbot-independent) system for CI that is in development, which will initially be used for Gaia Shepherd; see https://wiki.mozilla.org/Auto-tools/Projects/TaskCluster.  Our goal is to move all gaia testing to it, probably in Q2; it will easily be able to run tests against arbitrary PR's, for example, which will make implementing gaia-try pretty easy.

If we were to try and get gaia-try running in Q1 against buildbot, it would probably slow down work on TaskCluster, so it seems like waiting for TaskCluster is probably the right thing to do, but if you feel strongly otherwise, please let us know.
Flags: needinfo?(john+bugzilla)
Depends on: 986209
Depends on: 989125
Depends on: 989126
Depends on: 989131
Depends on: 989159
Depends on: 999086
No longer depends on: 989125
(In reply to Gareth Aye [:gaye] from comment #5)
> Now that we have gaia's integration test suite running on TBPL it is
> critically important for 
> 
> > 1 - The primary use case is to allow developers to debug gaia-specific failures in TBPL.
> 
> that we fix this. There's a big push in gaia right now to get our test suite
> stabilized on TBPL, but we can't move forward until this gets resolved.

I think we have this now, along with (2), with bug 986209.
https://tbpl.mozilla.org/?tree=Gaia-Try
Flags: needinfo?(catlee)
I think we're done here, from a Releng perspective.
Depends on: 1006693
No longer depends on: 1007435
Depends on: 1009695
Depends on: 1037005
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.