Closed Bug 784083 Opened 12 years ago Closed 12 years ago

token: Deploy tokenserver 1.1-2 to dev/stage token servers

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jbonacci, Assigned: bobm)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file)

Please deploy tokenserver 1.0 to Dev and Stage token* Dev: token{1..3}.reg.mtv1.dev.svc.mozilla.com Stage: token{1..3}.reg.scl2.stage.svc.mozilla.com
Assignee: nobody → bobm
QA Contact: jbonacci
Status: NEW → ASSIGNED
:bobm so just need some clarity on the RPM versioning: < bobm> Cool, the RPMs built as 0.1.0-1" vs. < tarek> I am pretty sure he meant 1.0-1
Whiteboard: [qa+]
This version of Token Server has a problem that makes it undeployable: https://bugzilla.mozilla.org/show_bug.cgi?id=784566
Depends on: 784566
tagged rpm-1.1 with the updated dep to PyBrowserID, this should work for you now.
This should be in Dev and Stage per :bobm. Verifying now.
Summary: token: Deploy tokenserver 1.0-1 to dev/stage token servers → token: Deploy tokenserver 1.1-0 to dev/stage token servers
Summary: token: Deploy tokenserver 1.1-0 to dev/stage token servers → token: Deploy tokenserver 1.1-1 to dev/stage token servers
This has been deployed. The Nagios alert on token2 in stage has cleared.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Verified this has been deployed to Dev and to Stage: Dev: token*.reg.mtv1.dev.svc.mozilla.com python26-pybrowserid-0.8.0-1.noarch python26-tokenserver-1.1-1.noarch python26-tokenlib-0.1.0-1.noarch (13 of these) /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application plus couchbase, statsd, powerhose worker and broker, nginx, logstash, circusd, puppetd, etc Stage: token*.reg.scl2.stage.svc.mozilla.com python26-pybrowserid-0.8.0-1.noarch python26-tokenlib-0.1.0-1.noarch python26-tokenserver-1.1-1.noarch (13 of these) /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application plus couchbase, statsd, powerhose worker and broker, nginx, logstash, circusd, puppetd, etc
Status: RESOLVED → VERIFIED
This build was showing some errors due to an old version of metlog, reopening to track deployment of rpm-1.1-2 which has an updated metlog dependency.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Summary: token: Deploy tokenserver 1.1-1 to dev/stage token servers → token: Deploy tokenserver 1.1-2 to dev/stage token servers
Deployed to stage. In process of deploying to dev.
Verified that we have the right RPM in Stage. Waiting for :bobm to finish Dev and update the ticket then I will Verify Dev...
Status: REOPENED → ASSIGNED
So, I still can not get a clean run in Stage. Load tests with and without the -r option are getting 503s in Stage. Relevant load tests can be seen in the Pencil graphs for Wednesday, 8/29/2012, 4pm to 6pm PDT.
Pencil graphs showing load test statistics during server crash. A tcpdump command was being run to verify that token server was using GETHOSTBYNAME(3) or otherwise making use of DNS caching through NSCD. All three token servers crashed under moderate load at ~16:37. It appears tcpdump caused a massive backlog in auditd reports, and a very high outbound report to arcsite for audit messages: Aug 29 08:36:16 token1 kernel: audit: audit_lost=187 audit_rate_limit=200 audit_backlog_limit=320 Aug 29 08:36:16 token1 kernel: audit: rate limit exceeded Aug 29 08:36:18 token1 snmpd[2204]: NET-SNMP version 5.5 Aug 29 16:36:19 token1 ntpdate[2240]: step time server 10.14.200.6 offset 28799.965615 sec Aug 29 16:36:19 token1 ntpd[2251]: ntpd 4.2.4p8@1.1612-o Thu May 13 14:38:25 UTC 2010 (1) Aug 29 16:36:19 token1 ntpd[2252]: precision = 0.371 usec Note that at ~20QPS network traffic peaked at ~40M during the crash. And later, without the tcpdump, when QPS reached as high as 120, network traffic was under 5M.
The auditd "flag" flag is set to 1, which means printk, or silently discard messages. If the auditd module was involved in the panic, it did not panic on purpose. (Note: that's an option available by setting the "flag" flag equal to 2.)
This was installed in Dev: Thu 30 Aug 2012 03:45:18 PM PDT Setting ticket to RESOLVED -> Fixed.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: