Closed
Bug 969184
Opened 11 years ago
Closed 11 years ago
Update tokenserver to use heka instead of metlog
Categories
(Cloud Services Graveyard :: Server: Token, defect, P1)
Cloud Services Graveyard
Server: Token
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Assigned: rfkelly)
References
Details
(Whiteboard: [qa+])
Attachments
(1 file)
13.02 KB,
patch
|
telliott
:
review+
|
Details | Diff | Splinter Review |
Metlog out, heka in. :RaFromBRC says this should be a pretty straightforward search-and-replace.
Assignee | ||
Comment 2•11 years ago
|
||
Some thoughts on the approach here, from IRC:
<RaFromBRC> probably the least amount of work is to replace metlog-py w/ heka-py, w/ minimal adjustments to get it working as before
<rfkelly> *nod*
<RaFromBRC> but that was all done a long time ago, heka's a different beast now, and we're not so focused on a custom protocol or a custom client these days
<RaFromBRC> it might make sense to write to files and parse, or to use syslog, etc.
<rfkelly> ok; more in line with that fxa-auth-server does
<rfkelly> and let heka pull it in from whereever
<RaFromBRC> right
<rfkelly> that sounds simpler longer-term
<RaFromBRC> yeah, depends on the priorities
<RaFromBRC> rfkelly: deciding what to do would be a matter of looking at what data is being pushed through the metlog client now, plus any other data that we know we'd want to push through heka but aren't yet
<RaFromBRC> and looking at it from above to put together a strategy for how to get it all there
<rfkelly> pretty sure it's limited to timing of various things, and application logs (e.g. tracebacks, warnings, etc)
<RaFromBRC> yeah, the decorators
<RaFromBRC> doing the timings
<rfkelly> if the heka-py transition would be pretty easy, it may be worth doing that first regardless, then refactor from therre
<RaFromBRC> yeah, i think that's probably wise
<rfkelly> which would also help us understand/remember exactly how it's used so far
<RaFromBRC> bingo
<rfkelly> (I'm going to snapshot this into my bugs for reference)
Comment 3•11 years ago
|
||
Some feedback from the ops side.
- we're seeing high CPU usage under load testing. Haven't profiled the code, but assuming that it is metlog + circus + stdout to file logging that is resulting in very high load
- seen something like this before in campaign manager with logging to disk causing high CPU load
- would like to see this under heka-py
For implementation:
- if we go a file stream, do we have to worry about file rotation?
- it would be nicer for ops if we streamed this into heka directly via UDP to 127.0.0.1 then they are decoupled for the most part, or another input method that does not require we worry about IO performance/disk space usage/file rotation
Updated•11 years ago
|
Priority: -- → P2
Comment 4•11 years ago
|
||
Update re: high CPU
The cause was that tokenserver was not using HTTPS connection pooling. Creating a new SSL connection to the verifier per request was crushing the box.
Though we still want conversion to heka-py. To help us in our debugging.
Comment 5•11 years ago
|
||
Related GitHub issues:
https://github.com/mozilla-services/puppet-config/issues/81
https://github.com/mozilla-services/puppet-config/issues/206
https://github.com/mozilla-services/puppet-config/issues/287
https://github.com/mozilla-services/puppet-config/pull/318
Status: NEW → ASSIGNED
Priority: P2 → P1
Updated•11 years ago
|
QA Contact: jbonacci
Comment 6•11 years ago
|
||
Updated list:
Related GitHub issues:
https://github.com/mozilla-services/puppet-config/issues/81
https://github.com/mozilla-services/puppet-config/issues/206
https://github.com/mozilla-services/puppet-config/issues/287
https://github.com/mozilla-services/puppet-config/pull/317
https://github.com/mozilla-services/puppet-config/pull/318
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → rfkelly
Comment 7•11 years ago
|
||
Is 998054 a potential dup of this?
Assignee | ||
Comment 8•11 years ago
|
||
This patch updates tokenserver for the new simplified metrics infra proposed in Bug 1012509. It's almost all just replacing metlog with stdlib logging, plus tweaking the details of a few timers.
The missing half of this is a deploy config change to make it use the JSON logger in production, and then trying it out in stage to see whether heka can slurp the logs in properly.
Attachment #8425266 -
Flags: review?(telliott)
Updated•11 years ago
|
Attachment #8425266 -
Flags: review?(telliott) → review+
Assignee | ||
Comment 9•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 10•11 years ago
|
||
I can verify this when we deploy bug 1014496
Comment 11•11 years ago
|
||
Verified heka is in use via the shared heka dashboard...
Status: RESOLVED → VERIFIED
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•