Closed
Bug 841441
Opened 11 years ago
Closed 11 years ago
Increase retention time in Graphite
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ericz, Assigned: ericz)
Details
We need to figure out what a good retention time is for Graphite based on how much storage we have and projected number of metrics.
Comment 1•11 years ago
|
||
Of course the best case scenario would be forever ;) But that's probably not practical. What are our options here? Can we have a year or two of data live or more? Can we "archive" old data and bring it back online if needed?
Assignee | ||
Comment 2•11 years ago
|
||
I'll see how many metrics we have and try to project disk usage in a few scenarios. To give you an idea of production use, in my experience the typical scenario was to store most metrics for a week, some for longer (e.g. a month) and some for a shorter time.
Comment 3•11 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #1) > What are our options here? Can we have a year or two of data live or more? > Can we "archive" old data and bring it back online if needed? that's not really how round robin databases work. they allocate the entire size of the file they are going to use as they are meant to write over old data when retention time completes. rrd (ala rrdtool) is very rigid (and fast) but once you write the data, it's a bit painful to change the rollups/retention time/etc. whisper files (ala carbon/graphite) are more flexible (and slower), but i haven't done much on resizing those and what you lose when doing so. Imo, we should shoot for a minimum of 18 months with as much granularity as we can spare. We can fudge some on the granularity if we need more space, but having a year+ lets you see seasonal changes/aberrations and makes some growth more visible.
Comment 4•11 years ago
|
||
(In reply to casey ransom [:casey] from comment #3) > (In reply to Shyam Mani [:fox2mike] from comment #1) > > What are our options here? Can we have a year or two of data live or more? > > Can we "archive" old data and bring it back online if needed? > that's not really how round robin databases work. they allocate the entire > size of the file they are going to use as they are meant to write over old > data when retention time completes. rrd (ala rrdtool) is very rigid (and > fast) but once you write the data, it's a bit painful to change the > rollups/retention time/etc. whisper files (ala carbon/graphite) are more > flexible (and slower), but i haven't done much on resizing those and what > you lose when doing so. Spot on here. When we turn the retention up, we will see what happens to the whisper files. I /think/ it will just pad the files with zeros. > Imo, we should shoot for a minimum of 18 months with as much granularity as > we can spare. We can fudge some on the granularity if we need more space, > but having a year+ lets you see seasonal changes/aberrations and makes some > growth more visible. Good point about seasonal changes/aberrations. Eric and I discussed 12 months earlier today. That should be no problem, 18 months should be doable. We need to get a better count of hosts in each data center and average number of metrics per hosts first. Eric: I say we turn the retention up to 12 months and unleash all the collectd hosts. From there we can measure the storage and make a better decision about a 18 month retention.
Comment 5•11 years ago
|
||
(In reply to Rick Bryce [:rbryce] from comment #4) > Spot on here. When we turn the retention up, we will see what happens to > the whisper files. I /think/ it will just pad the files with zeros. I was morbidly curious so I looked at whisper-resize.py. It renames the old, creates the new, then back fills old into new. That may take a while to complete but should be an online operation. ymmv may vary in a cluster though.
Assignee | ||
Comment 6•11 years ago
|
||
This has mostly been completed with a little cleanup to do all around still. We are fairly disk constrained at the moment so as a compromise have instated the following retention scheme: Minutely data stored for 30 days Hourly data stored for 2 years
Assignee | ||
Comment 7•11 years ago
|
||
We're getting a lot more metrics from collectd than expected so we're going to have to reduce retention time further until we get some bigger disks. For now, I've chosen: Minutely data for 20 days Hourly data for 1 year
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 8•11 years ago
|
||
are there any intentions of going further than 1 year? hourly data from 20 days to 1 year is really harsh.
Assignee | ||
Comment 9•11 years ago
|
||
:casey yes, see 847994. What would be a good trade off between disk space and retention time for you? I'm hoping to retain more but I don't think this is bad either.
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•