Closed
Bug 1265003
Opened 9 years ago
Closed 9 years ago
Run zookeeper/kafka on hgweb[11-14]
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: gps)
References
Details
We're currently running zookeeper/kafka on hgweb{1,9,10} and hgssh{2,3}. We should rebalance to hgweb[11-14] + hgssh3 (the current master).
Assignee | ||
Comment 1•9 years ago
|
||
This is done.
Unfortunately I caused an outage in the process. I followed the same steps I did when I added hgssh3 (see https://kafka.apache.org/documentation.html#operations for info about balancing replicas). I'm not sure what happened, but basically the cluster got in a really bad state. I attempted recovery. But after a few minutes I gave up, nuked /var/lib/{kafka,zookeeper} and started things from scratch. This is far from the ideal solution.
Looking at the Kafka docs, apparently there are some new-ish settings we can use to make recovery easier in the future. But hopefully we don't have to rebalance the cluster in a major way any time soon.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•