Closed
Bug 581187
Opened 15 years ago
Closed 14 years ago
Need a way to mark a back-end server as down
Categories
(Cloud Services Graveyard :: Server: Sync, defect)
Cloud Services Graveyard
Server: Sync
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: zandr, Assigned: telliott)
References
Details
(Whiteboard: [qa-])
Back-end databases often need to go down for maintenance or repair.
We need a way to tell the system that this has happened. As it is, we have a couple of bad behaviors associated with simply taking down a back-end DB host.
We may still assign nodes to the down server. Mitigating this in the current system requires removing the nodes from node_config.json (which is json, therefore you can't comment it out) and setting `ct` to 0 in the available_nodes table on the admin host. A hackish way to do it is simply to crank up the actives on the node, which will keep new assignments from happening until approximately 1am, but could pollute metrics. If a new user gets 503'd, we think they get 'unknown' error. This is untested.
If the host is all the way down, instead of merely refusing MySQL connections, then the webheads run out of apache processes. This is because the MySQL connection timeout is long. (60s) We can shorten this, but even at 5s, I could see running out of apache processes at high load.
Mitigating this requires repointing the shard_constants entries for the host at something that will refuse the db connection quickly. I've been using 127.0.0.1 for this.
So having a single place to flag a server as 'down' that will avoid these ugly behaviors would be extremely valuable to ops.
Assignee | ||
Comment 1•14 years ago
|
||
http://hg.mozilla.org/services/reg-server-secure/rev/8c06e8ab82cb
allows us to mark nodes as downed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Whiteboard: [qa-]
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•