Closed Bug 1609435 Opened 5 years ago Closed 4 years ago

Track inflxudb cardinality

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brian, Assigned: brian)

Details

We want to be able to alert on changes in the rate of cardinality increase, and to quickly identify what measurements they're occurring on. To do this, we can collect this information from influxdb regularly and write it back as measurements there.

cardinality per db

Tracking things at the database level is straightforward. For each database in show databases:

show series cardinality on foo and storing influxdb.cardinality.series database=foo cardinality=N

show measurement cardinality on foo and storing influxdb.cardinality.measurements database=foo cardinality=N

cardinality per measurement

Tracking series cardinality at the measurement level would also be very useful, but I haven't idenitified a way to do that. One thing we could potentially do is

For each tag in show tag keys:

show tag values cardinality with key = "baz" and for each measurement in the result storing influxdb.cardinality.tag_values database=foo measurement=bar tag=baz cardinality=N

This doesn't let us determine the series cardinality for a specific measurement; i.e. if one has two tags with 10 values each the actual cardinality could be anywhere from 10 to 100. This would let us track changes in the tag keys and values though, so it's better than nothing.

I am unsure if this is performant enough to actually do. In my limited testing someties these show tag values cardinality queries are reasonably fast, other times they're quite slow.

If we do this, we should bail out with an error if the number of measurements or number of tags keys is above some threshold to avoid having the tracking itself cause a cardinality explosion

Assignee: nobody → bpitts

I added the telegraf internal plugin by default to our projects in GCP back in December, and to our projects in AWS earlier in the week. It will take some time before we have data from it for most projects.

The internal plugin will give us a measurement of series written per telegraf. This again doesn't get us actual series cardinality over time, since one telegraf could be writing the same 1000 series every minute and another telegraf could be writing a new set of 1000 series every minute, but they would both record 1000 each time.

It should still be very helpful for seeing changes in the number of metrics written for a project though, and help us track down cardinality explosions like happened with sync-rs.

Thanks to help from inflxudata support i leared we show series cardinality supports a from clause that lets us get what we want, e.g.

> show series cardinality from "cpu"
name: cpu
count
-----
92035

so we can ignore my tag key shenanigans from the bug description and go with recording that for everything from show measurements instead.

Status: NEW → ASSIGNED

I've picked up that PR again and think it's in good shape. Once reviewed will merge and set up Jenkins job.

I've merged the code. Need to make some permissions changes then write the Jenkinsfile.

Brain dump that next steps here are something like

1a) create a new inflxudb user and give them perms to read form all dbs and write to svcops.
1b) document granting them read for any new dbs
1c) switch cres in jenkins to that user
2) change endpoint config in jenkins to remove path
3) write and deploy jenkinsfile, scheduling daily run

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.