Closed Bug 1492271 Opened 6 years ago Closed 6 years ago

Build a cluster-wide integration-testing framework

Categories

(Taskcluster :: Services, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1575956

People

(Reporter: dustin, Unassigned)

References

Details

We should have a robust integration / smoke testing framework that can be run against an active cluster. This will relieve some of the pressure on per-service unit tests to cover integration-related concerns (for example, does the index properly index real pulse messages from the queue, or are its fake messages subtly different?). It also means we can test things that are difficult to fake, such as workers' interactions with the rest of the cluster. Per discussion with Brian: * It'd be nice to have the tests relevant to a particular service or repo be located *in* that repo, and extracted during the cluster build process. In which case it'd also be nice to be able to run those alone without building/deploying (to iterate on the tests themselves, for example). The other option is a dedicated repo. * Where a test originates from service A, but involves services B and C, it should depend only on stable, documented behavior of B and C, but can rely on implementation details of A. Then we'll see bustage in smoke tests if we accidentally change documented behavior in B and C. * This should have some amount of UI, so that it can be triggered and the results reviewed in a browser aimed at the deployment in question. This doesn't have to be in the tc-web/tc-tools UI, but maybe that makes sense. * These tests will probably make assumptions about cluster configuration, such as what services are enabled. They should get access to some of that configuration to decide when to skip a test. * These tests will need some configuration in place. Is it enough to create a client that has access to set up that configuration? Can that be done with sufficiently limited scopes that it's not dangerous to have active in Firefox CI production? * The taskcluster-diagnostics repo might provide a starting place - it's basically a `mocha` run with some reporting hooked up. Another alternative is a task-like interface where we run a collection of commands in docker images. That would let tests be language-specific, and give a nice consistent API for how each test case is run and how its results are reported. This is not high-priority, but I'll continue to think about this and start with something simple that can provide a robust kernel for a fuller implementation to develop.
Depends on: tc-monorepo
Component: Redeployability → Services

I still like this idea, just shouldn't pretend I'm working on it.

Assignee: dustin → nobody

We talked a bit about this this week.

This should be a tool that can be used both by us and by an organization deploying TC -- a good way to get a sense of whether something is seriously broken, but without any attempt to cover all functionality. So, for example, cloudops could run it against new staging deployments to catch cases where something has gone wrong in the deployment process.

It can also address some known regressions, so that it's useful during an outage. A current example is, sometimes the queue's dependency resolver stops resolving dependencies. Integration tests could check whether dependencies resolve. Then an operations team would have a good way to find out quickly that this is the issue.

We'd probably like to write this in Go, as part of Taskcluster-CLI. Something like taskcluster diagnose.

Oh, also, I think bstack agreed to write at least the framework of this.

Depends on: 1560650
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.