bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

auto-configure nodes to join consul cluster

RESOLVED FIXED

Status

Socorro
Infra
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: rhelmer, Assigned: phrawzty)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
We have a staging consul cluster set up, but the nodes automatically spun up by the ASGs don't know how to connect to them.

phrawzty suggested using hiera-S3 to fetch this when the nodes are provisioned, which would work fine.

lonnen suggested we could use DNS for this, we should consider this too.
(Reporter)

Updated

3 years ago
Assignee: nobody → dmaher

Comment 1

3 years ago
Are these talking to an ELB?  I'd be mega curious to see this infra mapped out.
(Reporter)

Comment 2

3 years ago
(In reply to :jp JP Schneider from comment #1)
> Are these talking to an ELB?

There's no ELB, just an ASG with three nodes. Nodes have consul agents running in server mode, and are joined together into a cluster using "consul join <ip>" (consul uses the Raft consensus algorithm.)

All Socorro nodes that want to get configuration run consul agent in client mode, and need to join the consul server cluster. Running "consul join <ip>" where <ip> is any of the server nodes is sufficient (there's no harm in joining all of them either.)

> I'd be mega curious to see this infra mapped out.

I'll make a pretty drawing :)
(Reporter)

Comment 3

3 years ago
Here's a simple conceptual drawing:

https://docs.google.com/drawings/d/1uX1piaZAOk1YsnuWSkyilGvxza9U9hdKtbQVDDUUlTI/edit?usp=sharing

The Consul cluster itself and all Socorro app nodes are spun up by ASGs. The apps are provisioned when the node is spun up by the ASG via AWS launch configs, and we want them to automatically join the cluster - if they had knowledge of what the IPs of the consul cluster are, they could do that via puppet (puppet could pull this from S3, say.)

If we do DNS however, then I think an ELB in front of the consul nodes makes sense, since the apps could just "consul join <dns name>", and it doesn't particularly matter which server node they join as long as it's healthy. The ELB check is enough to tell that the consul server is running, but not necessarily that the cluster is healthy (we may be able to pull this off by hitting the right URL, need to investigate.)

Note that we currently don't have a great solution for consul *server* nodes to auto-discover each-other, or an existing cluster. Right now we have to manually join new consul nodes to the server cluster, DNS+ELB might work there for an existing cluster (bootstrapping a new cluster would still need to be done manually.)

I think this last issue deserves its own bug (consul servers auto-joining the cluster)
(Reporter)

Updated

3 years ago
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.