Closed
Bug 1434339
Opened 6 years ago
Closed 6 years ago
Upgrade zookeeper and kafka
Categories
(Developer Services :: Mercurial: hg.mozilla.org, enhancement)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: sheehan)
Details
Attachments
(1 file, 4 obsolete files)
We're running very old versions of Zookeeper and Kafka on hg.mozilla.org. Let's get them running modern versions.
Reporter | ||
Comment 1•6 years ago
|
||
I think this would be a good bug for Connor...
Assignee: gps → nobody
Status: ASSIGNED → NEW
Comment hidden (mozreview-request) |
Comment 3•6 years ago
|
||
Connor - I talked to gps about this yesterday and we thought it would make sense to assign this to you so you could become familiar with the upgrade process. My notes from this discussion, I'm sure he has more details. Finish existing work first Upgrade to stable versions before move to new data centre Kafka is more involved (need to test if we can update in place or take it down), zookeeper is easier Mozilla specific steps - we host our own packages no existing doc, would be good to create doc for next upgrade
Assignee: nobody → sheehan
Assignee | ||
Comment 4•6 years ago
|
||
mozreview-review |
Comment on attachment 8947921 [details] hgmo: upgrade ZooKeeper to 3.4.11 (bug 1434339) https://reviewboard.mozilla.org/r/217602/#review234656
Attachment #8947921 -
Flags: review+
Pushed by cosheehan@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/e7dada846ef3 hgmo: upgrade ZooKeeper to 3.4.11 r=sheehan
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•6 years ago
|
||
The Kafka brokers (hgweb1{1..4} and hgssh4) are now running Zookeeper 3.4.11.
Reporter | ||
Comment 11•6 years ago
|
||
mozreview-review |
Comment on attachment 8961019 [details] kafka: add protocol version fields to kafka server config (Bug 1434339) https://reviewboard.mozilla.org/r/229732/#review235884 The official docs don't use the 4th version field. Only e.g. `0.8.2`. But if you've tested this and it works, then it is OK by me.
Attachment #8961019 -
Flags: review?(gps) → review+
Reporter | ||
Comment 12•6 years ago
|
||
mozreview-review |
Comment on attachment 8961020 [details] kafka: upgrade downloaded binaries and references in kafka.service (Bug 1434339) https://reviewboard.mozilla.org/r/229734/#review235886 This is mostly good. Just a few low-level issues. And a higher-level concern about the use of `-daemon`. ::: ansible/roles/docker-kafkabroker/files/start-kafka:116 (Diff revision 1) > - ':/opt/kafka/libs/jopt-simple-3.2.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-javadoc.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-scaladoc.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-sources.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-test.jar:/opt/kafka/libs/kafka-clients-0.8.2.2.jar:/opt/kafka/libs/log4j-1.2.16.jar:/opt/kafka/libs/lz4-1.2.0.jar:/opt/kafka/libs/metrics-core-2.2.0.jar:/opt/kafka/libs/scala-library-2.10.4.jar:/opt/kafka/libs/slf4j-api-1.7.6.jar:/opt/kafka/libs/slf4j-log4j12-1.6.1.jar:/opt/kafka/libs/snappy-java-1.1.1.6.jar:/opt/kafka/libs/zkclient-0.3.jar:/opt/kafka/libs/zookeeper-3.4.6.jar:/opt/kafka/core/build/libs/kafka_2.10*.jar', > - 'kafka.Kafka', > - '/etc/kafka/server.properties', > -] > > -os.execl(command[0], *command) > +os.execle('/opt/kafka/bin/kafka-server-start.sh', '-daemon', '/etc/kafka/server.properties', env) Why `-daemon` here? I imagine daemon mode would only be needed for e.g. starting in a shell. supervisor and systemd both keep a handle on the started process. So I don't think we need to daemonize Kafka. ::: ansible/roles/kafka-broker/files/kafka.service:3 (Diff revision 1) > [Unit] > Description=Kafka distributed log server > -After=network.target remote-fs.target nss-lookup.target > +After=network.target remote-fs.target nss-lookup.target zookeeper.service `After` is used to ordering startup when multiple units are started simultaneously. I think what you want here is `Requires`. That says "this service depends on another service; start that other service automatically when starting this one." There is also `Requisite`, which expresses the same dependency except starting a service will fail if the prerequisite services aren't started (instead of starting them automatically). ::: ansible/roles/kafka-broker/files/kafka.service:16 (Diff revision 1) > -ExecStart=/usr/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/server-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dlog4j.configuration=file:/etc/kafka/log4j.properties -cp :/opt/kafka/libs/jopt-simple-3.2.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-javadoc.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-scaladoc.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-sources.jar:/opt/kafka/libs/kafka_2.10-0.8.2.2-test.jar:/opt/kafka/libs/kafka-clients-0.8.2.2.jar:/opt/kafka/libs/log4j-1.2.16.jar:/opt/kafka/libs/lz4-1.2.0.jar:/opt/kafka/libs/metrics-core-2.2.0.jar:/opt/kafka/libs/scala-library-2.10.4.jar:/opt/kafka/libs/slf4j-api-1.7.6.jar:/opt/kafka/libs/slf4j-log4j12-1.6.1.jar:/opt/kafka/libs/snappy-java-1.1.1.6.jar:/opt/kafka/libs/zkclient-0.3.jar:/opt/kafka/libs/zookeeper-3.4.6.jar:/opt/kafka/core/build/libs/kafka_2.10*.jar kafka.Kafka /etc/kafka/server.properties > +# Set custom logging directories via environment variables > +Environment="KAFKA_GC_LOG_OPTS=-Xloggc:/var/log/kafka/server-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M" > +Environment="KAFKA_LOG4J_OPTS=-Dlog4j.configuration=file:/etc/kafka/log4j.properties" > + > +ExecStart=/opt/kafka/bin/kafka-server-start.sh --daemon /etc/kafka/server.properties > +ExecStop=/opt/kafka/bin/kafka-server-stop.sh All `kafka-server-stop.sh` does is look for a java process running Kafka and sends a SIGTERM. https://github.com/apache/kafka/blob/trunk/bin/kafka-server-stop.sh This is the default behavior of systemd. So we don't need to define an `ExecStop` for this systemd service. That being said, `--daemon` may confuse systemd. systemd assumes the launches process will be alive for as long as the service lives. If `--daemon` causes the process to fork and the original process to exit, then systemd will think the service stopped and will attempt to start it again. This /could/ result in dozens of Kafka processes running on the machine! We either want to drop `--daemon` or set `[Service] Type = forking` to tell systemd that the process will fork. I think we want to drop `--daemon` and have the process live in the foreground. ::: ansible/roles/kafka-broker/tasks/main.yml:18 (Diff revision 1) > get_url: url=https://s3-us-west-2.amazonaws.com/moz-packages/{{ item.path }} > dest=/var/tmp/{{ item.path }} > sha256sum={{ item.sha256 }} > with_items: > - { path: zookeeper-3.4.11.tar.gz, sha256: f6bd68a1c8f7c13ea4c2c99f13082d0d71ac464ffaf3bf7a365879ab6ad10e84 } > - - { path: kafka_2.10-0.8.2.2.tgz, sha256: 3ba1967ee88c7f364964c8a8fdf6f5075dcf7572f8c9eb74f0285b308363ecab } > + - { path: kafka_2.11-1.0.0.tgz, sha256: b5b535f8db770cda8513e391917d0f5a35ef24c537ef3d29dcd9aa287da529f5 } Kafka 1.0.1 is out. So we might as well upgrade to that so we don't upgrade to a version with known bugs. I've uploaded kafka_2.11-1.0.1.tgz to the S3 bucket. ::: pylib/vcsreplicator/tests/test-cluster-unavailable.t:84 (Diff revision 1) > > $ hgmo exec hgweb0 /usr/bin/supervisorctl start kafka > kafka: started > $ hgmo exec hgweb1 /usr/bin/supervisorctl start kafka > kafka: started > + $ sleep 3 General rule: any time you need to add a `sleep` in a test to make the test pass, there's a race condition. A `sleep` will only ever make the test pass under some conditions (hopefully most). Under certain types of load, the test will still fail due to timing conditions.
Attachment #8961020 -
Flags: review?(gps) → review-
Comment 13•6 years ago
|
||
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/bd0effc7f07c kafka: add protocol version fields to kafka server config r=gps
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 14•6 years ago
|
||
This will all land and get deployed incrementally.
Keywords: leave-open
Reporter | ||
Updated•6 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 15•6 years ago
|
||
sheehan: would you mind closing the reviews on MozReview? Let's use Phabricator going forward.
Status: REOPENED → NEW
Flags: needinfo?(sheehan)
Assignee | ||
Updated•6 years ago
|
Attachment #8961019 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8961020 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8961022 -
Attachment is obsolete: true
Attachment #8961022 -
Flags: review?(gps)
Assignee | ||
Updated•6 years ago
|
Attachment #8961023 -
Attachment is obsolete: true
Attachment #8961023 -
Flags: review?(gps)
Assignee | ||
Comment 16•6 years ago
|
||
Closed MozReview request in favor of Phabricator.
Flags: needinfo?(sheehan)
Comment 17•6 years ago
|
||
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/bb635063a4d1 kafka: upgrade downloaded binaries and references in kafka.service r=gps
Comment 18•6 years ago
|
||
Pushed by cosheehan@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/a82ba68806cb kafka: remove -daemon from kafka.service systemd unit
Comment 19•6 years ago
|
||
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/74d6ce44c3a0 kafka: upgrade inter.broker.protocol.version to 1.1 ; r=gps
Comment 20•6 years ago
|
||
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/8318cec2b37a kafka: upgrade log.message.format.version to 1.1 ; r=gps
Assignee | ||
Comment 21•6 years ago
|
||
The Kafka brokers are all running Kafka v1.1 as of earlier today. I wrote some documentation for the upgrade process in https://phabricator.services.mozilla.com/D887 which will land soon.
Status: NEW → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Keywords: leave-open
Comment 22•6 years ago
|
||
Pushed by cosheehan@mozilla.com: https://hg.mozilla.org/hgcustom/version-control-tools/rev/cc87bf89b445 docs: add documentation for Kafka/Zookeeper upgrade process r=gps
You need to log in
before you can comment on or make changes to this bug.
Description
•