Closed Bug 1245385 Opened 10 years ago Closed 10 years ago

Deploy Dockerized Tokenserver

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mostlygeek, Assigned: mostlygeek)

References

Details

Attachments

(2 files)

Load test logs for new tokenserver, 2016-02-05 10 years ago Karl Thiessen [:kthiessen, he/him] 3.11 KB, text/plain		Details
comparison of current vs dockerized servers 10 years ago Benson Wong [:mostlygeek] 120.24 KB, image/png		Details

Benson Wong [:mostlygeek]

Assignee

Description

•

10 years ago

Task: - Convert tokenserver to deploy as a Docker container - QA / Load test it in stage - Create a new Jenkins deployment pipeline - Deploy it to production - Run it for a week beside current RPM based stack as 1 box - Scale RPM stack to 1 box, enable auto-scaling for docker stack - Remove RPM stack

Benson Wong [:mostlygeek]

Assignee

Updated

•

10 years ago

Assignee: nobody → bwong

QA Contact: kthiessen

Ryan Kelly [:rfkelly]

Comment 1

•

10 years ago

woot

Bob Micheletto [:bobm]

Comment 2

•

10 years ago

Opened Bug 1246008 to add a secondary load balancer specific health check.

Karl Thiessen [:kthiessen, he/him]

Comment 3

•

10 years ago

Will be getting logs of the testing in stage in tonight as attachments to this bug; they need a little cleanup first.

Karl Thiessen [:kthiessen, he/him]

Comment 4

•

10 years ago

Attached file Load test logs for new tokenserver, 2016-02-05 — Details

These are not exactly the most readable, but they capture all the tester-side data from the testing yesterday. Narrative comments are introduced by ##.

Benson Wong [:mostlygeek]

Assignee

Comment 5

•

10 years ago

Deployed a canary box to production today. Notice that it used about 50% higher CPU than the other boxes. Looking at the build logs [1] for the container (mozilla/browserid-verifier:0.3.0) it seems that node 4 has issues building the bigint library. Switched the base container back to node 0.10.41 [2]. The bigint library appears to build correctly. We want to schedule another load test in stage with the same number of servers and RDS instance: - 9 x c3.large web servers - 1 x m3.large RDS w/ 1TB gp2 to see how it compares. [1] https://circleci.com/gh/mozilla/browserid-verifier/6 [2] https://github.com/mozilla/browserid-verifier/pull/77

Benson Wong [:mostlygeek]

Assignee

Comment 6

•

10 years ago

Found a few issues with the production canary today. Issue #1 Disk full. The docker daemon by default logs to a json file on disk that never rotates. The solution is: - in /etc/sysconf/docker, add `--log-driver=none` - OPTIONS='--selinux-enabled --log-driver=none' Issue #2 - CPU use of Verifier - deployed mozilla/browserid-verifier:0.3.2 container - now CPU usage is in line with RPM based servers

Benson Wong [:mostlygeek]

Assignee

Updated

•

10 years ago

Flags: needinfo?(ckolos)

Benson Wong [:mostlygeek]

Assignee

Comment 7

•

10 years ago

:ckolos could you create a new docker AMI with `--log-driver=none`. Since we run docker containers as systemd units all the logs to stdout/stderr are already captured into journalctl.

Chris Kolosiwsky [:ckolos] (ckolos has left the building)

Comment 8

•

10 years ago

I agree that the log output should be handled in some way, as that's definitely not good, however, I disagree that we should set log-driver=none and instead leaving log-driver as default. setting the --log-opt max-size=<some size> or adding LOGROTATE=true to the end of /etc/sysconfig/docker

Flags: needinfo?(ckolos)

Chris Kolosiwsky [:ckolos] (ckolos has left the building)

Comment 9

•

10 years ago

*sigh* "setting the --log-opt max-size=<some size> or adding LOGROTATE=true to the end of /etc/sysconfig/docker" "I suggest setting the --log-opt max-size=<some size> or adding LOGROTATE=true to the end of /etc/sysconfig/docker"

Benson Wong [:mostlygeek]

Assignee

Comment 10

•

10 years ago

Update: Finally have a good version of dockerized Tokenserver in prod as a single server canary. I will be running this for a couple of weeks to see how it compares to the current generation of the service. A few notes on fixes: - run docker containers with --log-driver=none. This will likely be permanent as the AMI will rotate docker logs by default. I rather they didn't get written twice and instead, go straight to journald via systemd. - deployed browserid-verifier:0.3.2, which bumps the base container back to node v0.10.41 from node4 since bigint didn't compile correctly with node4 (for now) - CPU usage on the canary is the same as the previous generation - disk write (bytes and iops) are much lower. Not sure if this is due to more efficient logging of journald vs circus.

Benson Wong [:mostlygeek]

Assignee

Comment 11

•

10 years ago

Attached image comparison of current vs dockerized servers — Details

Attached a screenshot of current vs new dockerized tokenserver. The orange line is the docker based servers. - CPU usage is about the same. This is good. - Disk and network usage is much lower. This is interesting and I suspect it is due to writing logs directly to journald instead of having circus write logs to disk. Logs are being shipped correctly and comparing new/old servers they have the same amount of logs.

Benson Wong [:mostlygeek]

Assignee

Comment 12

•

10 years ago

Update to the plan: - Increase the number of docker based servers in the cluster to 2, 4 and 6 over the next couple of weeks - When we reach 6 it should make up most of the cluster with the RPM based servers coming/going from auto-scaling - Feb 29th, remove old servers and run everything from dockerized servers

Benson Wong [:mostlygeek]

Assignee

Comment 13

•

10 years ago

I decided to accelerate the migration plan when the new docker servers were showing very stable and positive results. Tonight the dockerized tokenserver instances are running 100% of production.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Benson Wong [:mostlygeek]

Assignee

Updated

•

10 years ago

Blocks: 1247736

Karl Thiessen [:kthiessen, he/him]

Updated

•

10 years ago

QA Contact: kthiessen

BMO Automation

Updated

•

3 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Deploy Dockerized Tokenserver

Categories

(Cloud Services Graveyard :: Server: Token, defect)

Tracking

(Not tracked)

People

(Reporter: mostlygeek, Assigned: mostlygeek)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type