Closed Bug 1600827 Opened 4 years ago Closed 4 years ago

Migrate custom infra scripts to use docker-compose

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mhentges, Assigned: sheehan)

References

Details

Attachments

(10 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

We want to improve the speed and maintainability of our end-to-end tests while maintaining a close similarity between the testing environment and production.


We have some custom scripts to create our environment for testing/production, but as advocated by Connor last week, these are tough to maintain and extend.

By migrating these to docker-compose, it should help progress bug 1598958 and improve our testing/deployment story

Blocks: 1598958
Assignee: nobody → sheehan
Status: NEW → ASSIGNED

Our custom Ansible-on-Docker build process tags containers using
a UUID to differentiate between each created image. Despite this,
almost all uses of the test suite involve using the most recent
image to invoke tests. This commit switches the image tag from a
UUID to a tag for only the latest image built from the Ansible-on-docker
process.

This change will also allow us to work around some limitations in
docker-compose's ability to work with offline Docker builds and
repositories.

It's unused in the repo and mostly redundant due to hgmo clean.

Depends on D66158

The start-kafka test script, which is used to wait on Zookeeper
to be fully up and ready before starting Kafka on a host, is
littered with race conditions. The script waits on several indicators
that, after some research, seem to be mostly arbitrary and unnecessary
to wait on before starting Kafka. This commit simplifies the script
to instead call the stat command on each known Zookeeper endpoint
and waits until the response is not empty, which is the key metric
to assert a Zookeeper endpoint is ready to accept connections from
Kafka. We also add a check to wait on the /kafka-servers test file
to be populated before attempting to use it when starting Kafka.

Depends on D66159

docker-compose normalizes project names by removing punctuation and
whitespace before assigning them to docker container labels. For
example, a project name like hello-there! will be converted to
hellothere when used as a label. To allow creating docker test
clusters with unique names in tests, a future commit will teach
the startup script to use the $TESTNAME environment variable
as the project name passed to docker-compose. Since we will
want to use the test name to identify which containers belong to
the cluster when orchestrating Kafka startup, we will need to
have a normalized copy of this test name. This commit introduces
a function to strip whitespace and punctuation from the test name,
which will be used by later commits when specifying docker-compose
project names. A docstring test is added to show the behaviour.

Depends on D66160

We have some code that is dynamically changing ldap configs at
cluster startup. However the values being changed to are actually
constant, making the change unnecessary and easily hard-coded into
the image.

Depends on D66161

Adding to both the Python 2 and Python 3 virtualenv, and updating
several other dependencies in our test environment while we're
at it.

Depends on D66162

This commit adds a docker-compose file to stand up a test
cluster for hg.mozilla.org. We define 5 services representing
the hgssh push server, the two hgweb servers, and
containers for Pulse and LDAP. Containers are started from
images with a tag the same name as the image repository (ie
hgmaster:hgmaster), mirroring the latest result of the
Ansible-on-Docker local build process. We define a bridge
network that all services are connected to, allowing us to
run tests against the cluster from .t tests running on the
host. Each machine which will have Kafka installed on it is
assigned a unique BROKER_ID which represents it's ID in
the Kafka cluster. Since we want to specify a different broker
ID for each hgweb servie, they are defined as carbon copies
of one another instead of using the --scale CLI flag.

An optional environment variable, MASTER_SSH_PORT can be
passed to the docker-compose up process to allow specifying
the SSH port to be assigned on the host machine. This feature
exists in the current test cluster implementation to allow a
cluster to be started on a port already selected by the Mercurial
test harness as a regex-replacable number. This allows us our
docker containers to print the port in test output and have
the expected result in our test be $HGPORT, giving a passing
test.

By passing the variable immediately next to the SSH port
value (22), we make the variable optional as ommitting the variable
will cause the placeholder to interpolate to a blank string,
leaving just "22". This means we can use this docker file
to shutdown a cluster using only the project name (ie without
the master ssh port number) and allows us to test the docker-compose
file itself without finding an unused port on the host system.

Depends on D66163

This commit adds a Python command to shut down an hgcluster
with the specified project name. The shutdown command is run
in the background after the function returns, and the output
of the command can be show using the show_output function.

Depends on D66164

Some more code that runs a string replacement of some value to hgssh.
This commit moves the hard-coded string into the docker image by
adding it as a step in the test-hg-web Ansible role.

Depends on D66456

This commit switches the ./hgmo test cluster driver to use
docker-compose to stand up and pull down the clusters.

The cluster switches from management via a state file to instead
be managed using the TESTNAME environment variable. This variable
is set in all .t tests and can be used to uniquely identify
clusters in test runs. All code paths that previously required
loading cluster state from the state file now make calls to the
docker API and return a dict representing the values previously
tracked by the state file.

We add a get_cluster_containers function that calls out to the
docker API. We list all containers using the sparse parameter
to avoid inspecting them (both a speedup and avoiding a race
condition). We filter the returned containers on the docker-compose
inserted label representing the project name (which is the normalized
value of the TESTNAME environment variable).

Most API calls switch from using the docker.api module (which
returns raw dicts representing the JSON returned from the
Docker API) to using the docker module directly. In cases where
we want to access data not available using the docker module
objects, we pass container.attrs to achieve the same result.

The auto_clean_orphans context manager is changed from making
direct API calls to Docker, to instead running docker-compose down
against each Docker test.

With these changes, several other processes in the test cluster are
able to be simplified. The kafka-servers asynchronous coordination
between the cluster starter and the containers is completely removed,
since the values passed around can be constants with docker-compose
aliasing (or these value were already constants in practice). The
ZOOKEEPER_BROKER_ID and KAFKA_BROKER_ID environment variables
are switched to a single BROKER_ID environment variable, since
these values were always the same in practice. All rewrites of
vcsreplicator configs are switched to hard-coded test values in
the test-X Ansible roles, or are switched to be based off the
BROKER_ID environment variable where appropriate.

Depends on D66457

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/5eea5b428956
testing: switch from uuid image tags to tags named after containers r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/d41c9a030f19
testing: remove hgmo stop command r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/b28246a9328d
ansible/test-kafkabroker: simplify start-kafka script r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/ad558ebe27f5
testing: add a function to normalize test names for use with docker-compose r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/624c5c392da9
testing: make constant ldap configs hard-coded r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/657515204d0a
testing: add docker-compose as a test requirement r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/a7055bf87558
testing: add a docker-compose file for standing up an hgmo test cluster r=mhentges
https://hg.mozilla.org/hgcustom/version-control-tools/rev/bd70d6020f84
testing: add a command to shut down an hgcluster with docker-compose r=mhentges,zeid
https://hg.mozilla.org/hgcustom/version-control-tools/rev/34ec2133ee3c
ansible/test-hg-web: remove unnecessary hard-coding of hgssh string in the startup phase r=mhentges,zeid
https://hg.mozilla.org/hgcustom/version-control-tools/rev/43c1b3e0d71f
testing: start hgcluster using docker-compose r=mhentges,zeid

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: