Closed Bug 1487184 Opened 7 years ago Closed 7 years ago

Investigate the logging-cep.prod.mozaws.net backpressure warnings

Categories

(Data Platform and Tools :: Monitoring & Alerting, enhancement, P1)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trink, Assigned: trink)

Details

(Whiteboard: DataOps)

This data analysis occasionally cannot keep up with the stream of data.
Assignee: nobody → mtrinkala
Points: --- → 2
Priority: -- → P1
Whiteboard: DataOps
The prod instance type should be upgraded the CPU is running at 100% current: c3.2xlarge proposed: c4.4xlarge (Hindsight analysis_thread configuration should be bumped to 16)
We can adjust the instance size with https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/logging/ansible/envs/logging.prod.yml#L1 however since this has been setup with autoscaling it will replace the current running instance. If we are concerned about maintaining state we need to figure out how to best handle the migration to the new instance. We could also potentially make use of 'stack' and create additional prod instances for specific use cases. I am not sure if this isolation is desired. The analysis_thread is currently configured to number of processors on the instance so no additional modification is needed here [1]. [1] https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/logging/puppet/modules/cep/templates/hindsight/hindsight.cfg.erb#L12
The trigger for this bug was a message with over 11K fields making the message cache very large. The entire cache was then cleared for every new message (CPU was being spent in the memset) now only the part of the cache being used is cleared. This fix should improve message throughput for just about all use cases. https://github.com/mozilla-services/lua_sandbox/pull/232/files
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.