As discussed in a meeting and of March with Selena we would like to start the migration from our custom CI to Taskcluster. The first step should be to get the functional and update tests run on Taskcluster nodes, and not our own ones. That means mozmill-ci will create its own task graphs and submit to Taskcluster once requests are coming in via Mozilla Pulse. The reporting will happen to Treeherder.
In terms of machine load it will add less additional tasks for now. As a starter we want to test about 5 locales for nightly builds on mozilla-central and mozilla-aurora. It means 20 tasks per day in case there is only a single nightly build per branch.
If all works fine we would like to also expand our testing to cover all locales. Selena, mind telling me upfront if that will become a problem regarding of the current capacity? We talk about 100 locales here which means about 400 tasks a day. Each task takes about 15min.
I had the chance to speak to Rail and Aki today. With their helpful feedback I would propose the following steps:
1. Get update tests integrated into the funsize taskgraph (https://github.com/mozilla/funsize/blob/master/funsize/tasks/funsize.yml). It means we create a task which depends on balrog_task and iterates over all locales for the generated partial updates. We can re-use variables similar to line 38. Rail was not sure how to test the graph and will look into that again.
2. Get functional tests integrated into the gecko decision graph. The latter doesn't have support for nightly and l10n builds yet, but Aki is currently working on it. Maybe by next week they will have a security consensus, so the feature can be started to get implemented, and might be available by end of Q2 or early Q3 as Tier-2.
If 2) will not make Q2 and we have a strong need for it, I could implement our own taskgraph via mozmill-ci by still listening to the Mozilla Pulse messages.
My first steps will be to get familiar with task graphs and to go through the TC documentation.
I had a chat with dustin on IRC right now regarding possible distributions and versions of Linux we could utilize to run our tests on:
<whimboo> hi. trying out the task creator right now. i see a defualt image of Ubuntu 13.10. Which others do we actually support?
<dustin> whimboo: anything on docker
<whimboo> dustin: would that also apply for the new taskcluster workers?
<whimboo> dustin: means i could even run my tests with a linux 32bit even we don't build nor test anything on that platform
<dustin> whimboo: sure
<whimboo> dustin: lovely. thanks
That means as long as we support 32bit builds of Firefox on Linux we also have to run our tests for it.
(In reply to Henrik Skupin (:whimboo) from comment #0)
> As discussed in a meeting and of March with Selena we would like to start
> the migration from our custom CI to Taskcluster. The first step should be to
> get the functional and update tests run on Taskcluster nodes, and not our
> own ones. That means mozmill-ci will create its own task graphs and submit
> to Taskcluster once requests are coming in via Mozilla Pulse. The reporting
> will happen to Treeherder.
> In terms of machine load it will add less additional tasks for now. As a
> starter we want to test about 5 locales for nightly builds on
> mozilla-central and mozilla-aurora. It means 20 tasks per day in case there
> is only a single nightly build per branch.
> If all works fine we would like to also expand our testing to cover all
> locales. Selena, mind telling me upfront if that will become a problem
> regarding of the current capacity? We talk about 100 locales here which
> means about 400 tasks a day. Each task takes about 15min.
No problem with capacity. We run in AWS, and we dynamically respond to load increases.
Thanks Selena! Looks like we are good to get all the locales tested then. That is great to hear.
Nick, as we talked about last week I will put a ni? on you regarding the possible testing method for the funsize taskgraph.
Ups, this was totally for Rail and not Nick! ;)
So I had an extensive chat with Peter this morning about all aspects of my planned work. It was a lot of information so trying to put it down here in a structured way:
It doesn't matter where the workers live. On Linux we thankfully can rely on docker and all the image which are already available. Given that we run our functional tests already via in-tree taskcluster configuration, we could utilize the desktop_test worker_type or at least its corresponding docker image. For isolation of our own tasks it would always be good to define our own worker type because it will get separate resources allocated in AWS, and would not share the queues with other tasks.
We also spoke about Windows which is coming next for me maybe next quarter so I know how to better build-up my tools to support different platforms. For that platform we could re-use our machines from mozmill-ci but could also use AWS. I would greatly like to make use of the latter by then. The only requirement are AMIs which need to be created for our needs. That would have to be done locally in a VM with some Powershell code like https://github.com/taskcluster/generic-worker/blob/master/worker_types/win2012r2/userdata. I will post more details on a follow-up bug where it's more appropriate.
Once a new worker type has been created (here an example: https://github.com/taskcluster/generic-worker/blob/master/worker_types/win2012r2/worker-definition.json) the TC team can help in setting it up in AWS. An overview of existent worker types can be found here: https://tools.taskcluster.net/aws-provisioner/. Lets keep in mind that workers need a different name for each platform.
For the update tests there would be two possibilities. First (as mentioned above) to integrate our task in the funsize graph, or second to create our own tasks by listening to the Mozilla Pulse messages. Given that we won't have any task graph yet for nightly builds, l10n repacks, and release builds my initially thought is to still use our Pulse listener from Mozmill-CI for all types of tasks involved. Those are needed for update and functional tests for nightly and release builds across locales, whereby only for those workers in AWS we potentially want a full-locale coverage.
The task definition should happen in a template (yml file) which then gets populated with actual data. The blob will then be used to create the task. Hereby it would be good to minimize the time a job runs. That means we will have to create a separate task for each combination of platform, version, and locale.
(In reply to Henrik Skupin (:whimboo) from comment #4)
> as we talked about last week I will put a ni? on you regarding the
> possible testing method for the funsize taskgraph.
Funsize extensively uses Balrog to generate combinations of versions to generate partials against. Balrog is behind VPN, so it's not that easy to test without having access to it.
I'd recommend the following 2 phase scenario:
Phase 1: Funsize independent tests
You can grab an existing funsize graph and create another task which would consume the artifacts of that graph without modifying the original graph. We can run this phase until you are happy with the test execution.
Phase 2: Integrate the tests into funsize graph
This shouldn't affect the result of funsize, because the tests won't be blocking for other funsize tasks.
Does the plan make sense?
Thanks for the reply Rail. Over the last days I had some time to dig into TC and what I most likely want as first step is to have everything combined in mozmill-ci by still using the Pulse listener. That was also an option we have discussed in our call. That way I have all the code together for nightly and release builds. Once the big graph and funsize are in tree, we can move tasks over. That should be easier for testing even.
Dustin, is that correct that the desktop-test docker image is based off Ubuntu 12.04? If that is the case I would like to create similar images but for newer LTS releases like 16.04 or/and 14.04. Those are the ones we would have to use for our tests, and I could imagine that the definitions should end-up in tree under testing/docker?
Yes, please put the definitions in-tree. You'll note that there are actually three different images, one built on the next, used to create desktop-test. You don't necessarily need to replicate that for your images.
As of today I got my first self-triggered task executed in Taskcluster and correctly report to Treeherder:
There is now a lot of refactoring and clean-up of code which I will have to do for mozmill-ci.
That's awesome, nice work whimboo!
Figured out that running Linux32 builds on a desktop-test docker image doesn't work because it is 64bit. After talking to Dustin on IRC it looks like that we don't really have a story for Linux32 testing yet or even won't have at all.
Given that our tests rely on docker images as created by in-tree docker configs, and that there is no Linux32 docker-test image available, I can only cover Linux64 tests for now.
As a further update I can tell that both functional and update tests are successfully working now.
I got all the work done for this bug except for the mozilla-aurora case, which will not work at the moment due to other requirements we would need - a way to set the update channel to auroratest after a tree merge. This is more work which I would like to tackle next quarter. For now I think we should follow what we usually do in-tree and let code ride the trains - means lets start with mozilla-central first and handle functional and update tests for linux64 in Taskcluster.
Example reports can be found here:
I filed the following issue on the mozmill-ci repository to get the PR reviewed:
The new feature is alive on mozmill-ci staging now and will be pushed to production on Monday if no regressions have been noticed.
This is already active for a while without any regression. So I'm going to close this bug.