If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Autoscale the telemetry edge server

NEW
Unassigned

Status

Data Platform and Tools
Pipeline Ingestion
P3
normal
5 months ago
5 months ago

People

(Reporter: whd, Unassigned)

Tracking

Details

(Whiteboard: [SvcOps])

(Reporter)

Description

5 months ago
Previously the ingestion stack was fronted by a tee server, which autoscaled based on load. Now that the edge server is running nginx again, we should have it autoscale as well.

This is very low priority work since the edge is very efficient, to the point where it is unlikely that we will actually need to scale beyond the standard number of instances we provide for reliability (3). For instance, we recently DDOS'd ourselves (bug #1353364) and we were able to handle that traffic easily. We may even want to consider switching instance types to something smaller. Regardless, the bottleneck for the edge will be network throughput, so the autoscaling rules should be based on that.

The only thing I would want to re-verify is that the s3 upload works correctly on instance termination due to scaling events.

Updated

5 months ago
Component: Metrics: Pipeline → Pipeline Ingestion
Product: Cloud Services → Data Platform and Tools
You need to log in before you can comment on or make changes to this bug.