Closed Bug 581187 Opened 14 years ago Closed 14 years ago

Need a way to mark a back-end server as down

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zandr, Assigned: telliott)

References

Details

(Whiteboard: [qa-])

Back-end databases often need to go down for maintenance or repair.

We need a way to tell the system that this has happened. As it is, we have a couple of bad behaviors associated with simply taking down a back-end DB host.

We may still assign nodes to the down server. Mitigating this in the current system requires removing the nodes from node_config.json (which is json, therefore you can't comment it out) and setting `ct` to 0 in the available_nodes table on the admin host. A hackish way to do it is simply to crank up the actives on the node, which will keep new assignments from happening until approximately 1am, but could pollute metrics. If a new user gets 503'd, we think they get 'unknown' error. This is untested.

If the host is all the way down, instead of merely refusing MySQL connections, then the webheads run out of apache processes. This is because the MySQL connection timeout is long. (60s) We can shorten this, but even at 5s, I could see running out of apache processes at high load.

Mitigating this requires repointing the shard_constants entries for the host at something that will refuse the db connection quickly. I've been using 127.0.0.1 for this.

So having a single place to flag a server as 'down' that will avoid these ugly behaviors would be extremely valuable to ops.
Blocks: 592376
http://hg.mozilla.org/services/reg-server-secure/rev/8c06e8ab82cb

allows us to mark nodes as downed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 598959
Whiteboard: [qa-]
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.