[provisioner] Support DigitalOcean droplets

RESOLVED INACTIVE

Status

RESOLVED INACTIVE
4 years ago
4 months ago

People

(Reporter: ted, Unassigned)

Tracking

Details

rr currently doesn't have automated tests because it requires perf counters which aren't enabled on most cloud hosts. But it turns out that DigitalOcean has them enabled:
https://github.com/mozilla/rr/issues/1433#issuecomment-96875732

It'd be great if we had DigitalOcean droplet support in the provisioner so we could spin up on-demand builders for rr CI. The cheap droplets are pretty slow to build/test rr, so it'd be nice to use a beefy instance and not have to run it 24/7 (actual commit volume is pretty low).
This should probably be a separate provisioner that only provisions for digital-ocean.

@ted,
Is there any reason not to do the build on a beefy EC2 instance. Then do the tests in a separate task on
a droplet. Would rr tests (no build) require a beefy droplet?
Or does build and test have to be on the same machine?

Note, this would also require a deployment of docker-worker on digital-ocean,
but I suspect that's pretty easy.
At some point we'll want to be able to run Firefox tests under rr on digitalocean too.  That's a bit further off though.
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #1)
> @ted,
> Is there any reason not to do the build on a beefy EC2 instance. Then do the
> tests in a separate task on
> a droplet. Would rr tests (no build) require a beefy droplet?
> Or does build and test have to be on the same machine?

They don't have to be on the same machine, but rr builds take a lot less time than running the tests. On my quad-core machine it's 27s to build from scratch and 108s to run the tests (fastcheck). So I don't think that would be much of a win, especially after you factor in the cost of moving the build products between machines.
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #1)
> This should probably be a separate provisioner that only provisions for
> digital-ocean.

That seems fine, I didn't know exactly how it'd fit into the current TC architecture.


> @ted,
> Is there any reason not to do the build on a beefy EC2 instance. Then do the
> tests in a separate task on
> a droplet. Would rr tests (no build) require a beefy droplet?
> Or does build and test have to be on the same machine?

As roc points out the build isn't a big problem, and the effort to split build + test up is probably more work than it'd be saving us.


> Note, this would also require a deployment of docker-worker on digital-ocean,
> but I suspect that's pretty easy.

It looks like Packer already has DigitalOcean support, so this might actually be easy:
https://www.packer.io/docs/builders/digitalocean.html
Yeah, this shouldn't be super complicated.
Provisioning gets hard if we have huge load, and don't want to pay too much or wait too long :)

Otherwise, this is mainly a matter of configuring docker-worker which we already package with packer.
And setting up a provisioner to launch nodes on digital-ocean.
If the DO api is less 'eventually' consistent than the ec2 one, porting the aws-provisioner to DO should be fairly trivial.  It could probably be done in the same repository as a configuration option!
@jhford,
digital-ocean is a different concept, no spot nodes, no security-groups, and very different API.
No aws-manager, no where near the same amount of config options. Reusing aws-provisioner sounds
sketchy to me, it's already complicated enough.

I suspect we can do with something very simple/stupid. Ie. a background worker configured with env variables, then build something smarter when we have more load, and need to do smarter.
Does anybody know if Google Compute Engine would work too? From what I read it's running KVM too...

Given their new pricing model (preemptable VMs), they would be preferable to digital-ocean.
In fact we should probably experiment with GCE as an alternative to EC2 Spot Nodes.
GCE does not work for rr.  Google is running with performance counters disabled.
Component: TaskCluster → AWS-Provisioner
Product: Testing → Taskcluster
Found in triage.

Still valid, and may be easier following changes to the provisioner for redeployability.
This is no longer necessary for the rr use case, as the newer c5 instances in EC2 provide access to the performance counters that rr needs.
We have other higher-priority cloud providers to support currently.
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.