Closed
Bug 775600
Opened 12 years ago
Closed 12 years ago
Delete briarpatch-graphite-stage1.private.scl3
Categories
(Infrastructure & Operations :: Virtualization, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: gcox)
References
Details
(Whiteboard: [briarpatch])
The whisper db on briarpatch-graphite-stage1 is growing without bounds. I shut off the carbon collector this morning.
If this is truly just a staging instance, we should have some kind of data culling in place to make sure we don't fill up the disk.
Comment 1•12 years ago
|
||
Until the production VM is setup in PHX (which was happening sometime this week) :lonnen was using the staging instance to test with. We did not know how large the disk database was going to get and we wanted to find out what the size would be given the graphite configuration.
Comment 2•12 years ago
|
||
Whisper should be sampling the values to lower precisions. It's unclear to me what the difference is between specifying multiple retentions and including the storage-aggregation.conf file. I will investigate further tonight.
Downsampling should work. If we need to cull data, we can change the retentions in storage-schemas to hold the data for a shorter period of time.
The docs mention a tool called whisper-resize.py for modifying existing whisper dbs, but a search for it doesn't show much more info. I'll also look into using that to trim what we already have.
Comment 3•12 years ago
|
||
I modified the storage-schemas file to retain the data binned at the same resolution for less time:
60 second resolution for 12 hours
1 hour resolution for 10 days
1 day resolution for 3 years
I also created the storage-aggregation file, correcting a bug, so carbon should aggregate values with sum instead of mean.
I believe carbon will need to be restarted for these changes to take effect, but they should help with the growth problem. It looks like the service has been more or less up for 10 days, which is reassuring from a disk space perspective.
Updated•12 years ago
|
QA Contact: lsblakk → hwine
Comment 4•12 years ago
|
||
The hard drive is full again on this box and it's mostly graphite data that's the problem.
Reporter | ||
Comment 5•12 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #3)
> 60 second resolution for 12 hours
> 1 hour resolution for 10 days
> 1 day resolution for 3 years
:lonnen: can you tell which of these buckets is responsible and adjust accordingly? I don't think we need 60 second resolution at all, frankly.
Comment 6•12 years ago
|
||
Coop: could you (or someone else from releng) recommend more sensible retention rates? If we don't need 60 seconds, we can drop that entirely for now.
Reporter | ||
Comment 7•12 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #6)
> Coop: could you (or someone else from releng) recommend more sensible
> retention rates? If we don't need 60 seconds, we can drop that entirely for
> now.
I will poll the group and see what we want.
Comment 8•12 years ago
|
||
The disk has filled up again fyi
[root@briarpatch-graphite-stage1.private.scl3 ~]# df -h /dev/sda3
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 194G 186G 0 100% /
Reporter | ||
Comment 9•12 years ago
|
||
[12:48pm] mburns: coop: ping re: https://bugzilla.mozilla.org/show_bug.cgi?id=775600
[12:51pm] coop: mburns: pong
[12:52pm] mburns: Have you gotten a chance to poll the group on retention policies for briarpatch-graphite-stage1, or what older stuff can safely be rm'd until you can?
[12:55pm] coop: mburns: we've decided to just shut it off
[12:56pm] mburns: so want to decommission the whole server, or just turn it down for a while?
[12:56pm] armenzg_lunch is now known as armenzg.
[12:56pm] coop: we don't have anyone available to take over that work, and the data is not helping us at present
[12:57pm] coop: we'll turn off the collectors for now, and will most like decommission the server soon
Comment 10•12 years ago
|
||
per comment 9, we have no use for this disk. Please decommission
Assignee: nobody → server-ops-storage
Component: Release Engineering: Developer Tools → Server Operations: Storage
QA Contact: hwine → dparsons
Comment 11•12 years ago
|
||
disk is attached to a VM, so decommissioning disk implies getting rid of the VM
Please pass the VM info to :jhopkins and he'll make the final call.
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?(jhopkins)
Comment 12•12 years ago
|
||
Please proceed with deleting the briarpatch-graphite-stage1.private.scl3 VM.
Flags: needinfo?(jhopkins)
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-storage → gcox
Component: Server Operations: Storage → Server Operations: Virtualization
Assignee | ||
Comment 13•12 years ago
|
||
Powered off VM, in case there's screaming.
Severity: normal → minor
Summary: Disk on briarpatch-graphite-stage1 is full → Delete briarpatch-graphite-stage1.private.scl3
Assignee | ||
Comment 14•12 years ago
|
||
Removed from RHN and inventory.
Pulled from DNS (change 65310) and puppet (change 65313).
Deleted VM from disk.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•