Closed
Bug 784083
Opened 12 years ago
Closed 12 years ago
token: Deploy tokenserver 1.1-2 to dev/stage token servers
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jbonacci, Assigned: bobm)
References
Details
(Whiteboard: [qa+])
Attachments
(1 file)
63.54 KB,
application/x-gzip
|
Details |
Please deploy tokenserver 1.0 to Dev and Stage token*
Dev:
token{1..3}.reg.mtv1.dev.svc.mozilla.com
Stage:
token{1..3}.reg.scl2.stage.svc.mozilla.com
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → bobm
QA Contact: jbonacci
Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 1•12 years ago
|
||
:bobm so just need some clarity on the RPM versioning:
< bobm> Cool, the RPMs built as 0.1.0-1"
vs.
< tarek> I am pretty sure he meant 1.0-1
Whiteboard: [qa+]
Assignee | ||
Comment 2•12 years ago
|
||
This version of Token Server has a problem that makes it undeployable: https://bugzilla.mozilla.org/show_bug.cgi?id=784566
Comment 3•12 years ago
|
||
tagged rpm-1.1 with the updated dep to PyBrowserID, this should work for you now.
Reporter | ||
Comment 4•12 years ago
|
||
This should be in Dev and Stage per :bobm. Verifying now.
Summary: token: Deploy tokenserver 1.0-1 to dev/stage token servers → token: Deploy tokenserver 1.1-0 to dev/stage token servers
Reporter | ||
Updated•12 years ago
|
Summary: token: Deploy tokenserver 1.1-0 to dev/stage token servers → token: Deploy tokenserver 1.1-1 to dev/stage token servers
Assignee | ||
Comment 5•12 years ago
|
||
This has been deployed. The Nagios alert on token2 in stage has cleared.
Reporter | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 6•12 years ago
|
||
Verified this has been deployed to Dev and to Stage:
Dev:
token*.reg.mtv1.dev.svc.mozilla.com
python26-pybrowserid-0.8.0-1.noarch
python26-tokenserver-1.1-1.noarch
python26-tokenlib-0.1.0-1.noarch
(13 of these)
/usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
plus couchbase, statsd, powerhose worker and broker, nginx, logstash, circusd, puppetd, etc
Stage:
token*.reg.scl2.stage.svc.mozilla.com
python26-pybrowserid-0.8.0-1.noarch
python26-tokenlib-0.1.0-1.noarch
python26-tokenserver-1.1-1.noarch
(13 of these)
/usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
plus couchbase, statsd, powerhose worker and broker, nginx, logstash, circusd, puppetd, etc
Status: RESOLVED → VERIFIED
Comment 7•12 years ago
|
||
This build was showing some errors due to an old version of metlog, reopening to track deployment of rpm-1.1-2 which has an updated metlog dependency.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Summary: token: Deploy tokenserver 1.1-1 to dev/stage token servers → token: Deploy tokenserver 1.1-2 to dev/stage token servers
Assignee | ||
Comment 8•12 years ago
|
||
Deployed to stage. In process of deploying to dev.
Reporter | ||
Comment 9•12 years ago
|
||
Verified that we have the right RPM in Stage.
Waiting for :bobm to finish Dev and update the ticket then I will Verify Dev...
Status: REOPENED → ASSIGNED
Reporter | ||
Comment 10•12 years ago
|
||
So, I still can not get a clean run in Stage.
Load tests with and without the -r option are getting 503s in Stage.
Relevant load tests can be seen in the Pencil graphs for Wednesday, 8/29/2012, 4pm to 6pm PDT.
Assignee | ||
Comment 11•12 years ago
|
||
Pencil graphs showing load test statistics during server crash. A tcpdump command was being run to verify that token server was using GETHOSTBYNAME(3) or otherwise making use of DNS caching through NSCD. All three token servers crashed under moderate load at ~16:37.
It appears tcpdump caused a massive backlog in auditd reports, and a very high outbound report to arcsite for audit messages:
Aug 29 08:36:16 token1 kernel: audit: audit_lost=187 audit_rate_limit=200 audit_backlog_limit=320
Aug 29 08:36:16 token1 kernel: audit: rate limit exceeded
Aug 29 08:36:18 token1 snmpd[2204]: NET-SNMP version 5.5
Aug 29 16:36:19 token1 ntpdate[2240]: step time server 10.14.200.6 offset 28799.965615 sec
Aug 29 16:36:19 token1 ntpd[2251]: ntpd 4.2.4p8@1.1612-o Thu May 13 14:38:25 UTC 2010 (1)
Aug 29 16:36:19 token1 ntpd[2252]: precision = 0.371 usec
Note that at ~20QPS network traffic peaked at ~40M during the crash. And later, without the tcpdump, when QPS reached as high as 120, network traffic was under 5M.
Assignee | ||
Comment 12•12 years ago
|
||
The auditd "flag" flag is set to 1, which means printk, or silently discard messages. If the auditd module was involved in the panic, it did not panic on purpose. (Note: that's an option available by setting the "flag" flag equal to 2.)
Assignee | ||
Comment 13•12 years ago
|
||
This was installed in Dev: Thu 30 Aug 2012 03:45:18 PM PDT
Setting ticket to RESOLVED -> Fixed.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•