Closed Bug 1179136 Opened 9 years ago Closed 9 years ago

Update CEP configuration for GA

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: whd, Assigned: whd)

Details

(Whiteboard: [rC] [unifiedTelemetry])

Wesley Dawson [:whd]

Assignee

Description

•

9 years ago

We have a plan for this from the work week. The short version is: N CEP servers, K Kafka partitions (partitioned by client ID), each reads K/N partitions (the first also reads records with no client ID). Each CEP sends its aggregates to a single CEP aggregator that acts as the current CEP does. This will be implemented by having N ASGs of 1, with user data + local yaml to inform the node which partitions it should consume.

In this configuration, if we lose an instance for some reason, the data will end up looking inconsistent (especially if we lose the aggregator). Since this is only monitoring/ad-hoc analysis we're willing to accept this for GA.

I worked out on the flight back a marginal improvement on the above involving periodic S3 state saves and kafka buffers (allowing us to lose much less state / recover more easily) but I've been writing a lot of bugmail and don't have the bandwidth to present it here.

The CEP nodes will need to be scaled manually horizontally (if vertical scaling on N is not sufficient), but we should have alerting for that. Adding a CEP requires reshuffling the partitions across the nodes (and again, state becomes inconsistent), but as it's monitoring data we're OK with it.

Thomas Huelbert

Updated

•

9 years ago

Assignee: nobody → whd

Priority: -- → P2

Whiteboard: [rC] [unifiedTelemetry]

Wesley Dawson [:whd]

Assignee

Comment 1

•

9 years ago

This is done in the following PRs:

https://github.com/mozilla-services/puppet-config/pull/1517
https://github.com/mozilla-services/svcops/pull/653

I will be filing separate bugs for load testing and deployment of these changes.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Update CEP configuration for GA

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

Tracking

(Not tracked)

People

(Reporter: whd, Assigned: whd)

References

Details

(Whiteboard: [rC] [unifiedTelemetry])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated