841441 - Increase retention time in Graphite

Assignee

Description

•

11 years ago

We need to figure out what a good retention time is for Graphite based on how much storage we have and projected number of metrics.

Shyam Mani [:fox2mike]

Comment 1

•

11 years ago

Of course the best case scenario would be forever ;) But that's probably not practical.

What are our options here? Can we have a year or two of data live or more? Can we "archive" old data and bring it back online if needed?

Eric Ziegenhorn :ericz

Assignee

Comment 2

•

11 years ago

I'll see how many metrics we have and try to project disk usage in a few scenarios.  To give you an idea of production use, in my experience the typical scenario was to store most metrics for a week, some for longer (e.g. a month) and some for a shorter time.

casey ransom [:casey]

Comment 3

•

11 years ago

(In reply to Shyam Mani [:fox2mike] from comment #1)
> What are our options here? Can we have a year or two of data live or more?
> Can we "archive" old data and bring it back online if needed?
that's not really how round robin databases work. they allocate the entire size of the file they are going to use as they are meant to write over old data when retention time completes.  rrd (ala rrdtool) is very rigid (and fast) but once you write the data, it's a bit painful to change the rollups/retention time/etc. whisper files (ala carbon/graphite) are more flexible (and slower), but i haven't done much on resizing those and what you lose when doing so.

Imo, we should shoot for a minimum of 18 months with as much granularity as we can spare. We can fudge some on the granularity if we need more space, but having a year+ lets you see seasonal changes/aberrations and makes some growth more visible.

Rick Bryce [:rbryce]

Comment 4

•

11 years ago

(In reply to casey ransom [:casey] from comment #3)
> (In reply to Shyam Mani [:fox2mike] from comment #1)
> > What are our options here? Can we have a year or two of data live or more?
> > Can we "archive" old data and bring it back online if needed?
> that's not really how round robin databases work. they allocate the entire
> size of the file they are going to use as they are meant to write over old
> data when retention time completes.  rrd (ala rrdtool) is very rigid (and
> fast) but once you write the data, it's a bit painful to change the
> rollups/retention time/etc. whisper files (ala carbon/graphite) are more
> flexible (and slower), but i haven't done much on resizing those and what
> you lose when doing so.

Spot on here.  When we turn the retention up, we will see what happens to the whisper files.  I /think/ it will just pad the files with zeros.

> Imo, we should shoot for a minimum of 18 months with as much granularity as
> we can spare. We can fudge some on the granularity if we need more space,
> but having a year+ lets you see seasonal changes/aberrations and makes some
> growth more visible.
Good point about seasonal changes/aberrations.  Eric and I discussed 12 months earlier today.  That should be no problem, 18 months should be doable.  We need to get a better count of hosts in each data center and average number of metrics per hosts first.  

Eric:  I say we turn the retention up to 12 months and unleash all the collectd hosts.  From there we can measure the storage and make a better decision about a 18 month retention.

casey ransom [:casey]

Comment 5

•

11 years ago

(In reply to Rick Bryce [:rbryce] from comment #4)
> Spot on here.  When we turn the retention up, we will see what happens to
> the whisper files.  I /think/ it will just pad the files with zeros.

I was morbidly curious so I looked at whisper-resize.py. It renames the old, creates the new, then back fills old into new.  That may take a while to complete but should be an online operation.  ymmv may vary in a cluster though.

Eric Ziegenhorn :ericz

Assignee

Comment 6

•

11 years ago

This has mostly been completed with a little cleanup to do all around still.  We are fairly disk constrained at the moment so as a compromise have instated the following retention scheme:

Minutely data stored for 30 days
Hourly data stored for 2 years

Eric Ziegenhorn :ericz

Assignee

Comment 7

•

11 years ago

We're getting a lot more metrics from collectd than expected so we're going to have to reduce retention time further until we get some bigger disks.  For now, I've chosen:

Minutely data for 20 days
Hourly data for 1 year

Eric Ziegenhorn :ericz

Assignee

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

casey ransom [:casey]

Comment 8

•

11 years ago

are there any intentions of going further than 1 year? hourly data from 20 days to 1 year is really harsh.

Eric Ziegenhorn :ericz

Assignee

Comment 9

•

11 years ago

:casey yes, see 847994.  What would be a good trade off between disk space and retention time for you?  I'm hoping to retain more but I don't think this is bad either.

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: mozilla.org → mozilla.org Graveyard

Bugzilla

Quick Search

Increase retention time in Graphite

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: ericz, Assigned: ericz)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated