Closed
Bug 1487184
Opened 7 years ago
Closed 7 years ago
Investigate the logging-cep.prod.mozaws.net backpressure warnings
Categories
(Data Platform and Tools :: Monitoring & Alerting, enhancement, P1)
Data Platform and Tools
Monitoring & Alerting
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: trink, Assigned: trink)
Details
(Whiteboard: DataOps)
This data analysis occasionally cannot keep up with the stream of data.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → mtrinkala
Points: --- → 2
Priority: -- → P1
Whiteboard: DataOps
Assignee | ||
Comment 1•7 years ago
|
||
The prod instance type should be upgraded the CPU is running at 100%
current: c3.2xlarge
proposed: c4.4xlarge (Hindsight analysis_thread configuration should be bumped to 16)
Comment 2•7 years ago
|
||
We can adjust the instance size with https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/logging/ansible/envs/logging.prod.yml#L1 however since this has been setup with autoscaling it will replace the current running instance. If we are concerned about maintaining state we need to figure out how to best handle the migration to the new instance.
We could also potentially make use of 'stack' and create additional prod instances for specific use cases. I am not sure if this isolation is desired.
The analysis_thread is currently configured to number of processors on the instance so no additional modification is needed here [1].
[1] https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/logging/puppet/modules/cep/templates/hindsight/hindsight.cfg.erb#L12
Assignee | ||
Comment 3•7 years ago
|
||
The trigger for this bug was a message with over 11K fields making the message cache very large. The entire cache was then cleared for every new message (CPU was being spent in the memset) now only the part of the cache being used is cleared. This fix should improve message throughput for just about all use cases.
https://github.com/mozilla-services/lua_sandbox/pull/232/files
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•