775600 - Delete briarpatch-graphite-stage1.private.scl3

Reporter

Description

•

12 years ago

The whisper db on briarpatch-graphite-stage1 is growing without bounds. I shut off the carbon collector this morning. If this is truly just a staging instance, we should have some kind of data culling in place to make sure we don't fill up the disk.

Mike Taylor [:bear]

Comment 1

•

12 years ago

Until the production VM is setup in PHX (which was happening sometime this week) :lonnen was using the staging instance to test with. We did not know how large the disk database was going to get and we wanted to find out what the size would be given the graphite configuration.

Lonnen :lonnen

Comment 2

•

12 years ago

Whisper should be sampling the values to lower precisions. It's unclear to me what the difference is between specifying multiple retentions and including the storage-aggregation.conf file. I will investigate further tonight. Downsampling should work. If we need to cull data, we can change the retentions in storage-schemas to hold the data for a shorter period of time. The docs mention a tool called whisper-resize.py for modifying existing whisper dbs, but a search for it doesn't show much more info. I'll also look into using that to trim what we already have.

Lonnen :lonnen

Comment 3

•

12 years ago

I modified the storage-schemas file to retain the data binned at the same resolution for less time: 60 second resolution for 12 hours 1 hour resolution for 10 days 1 day resolution for 3 years I also created the storage-aggregation file, correcting a bug, so carbon should aggregate values with sum instead of mean. I believe carbon will need to be restarted for these changes to take effect, but they should help with the growth problem. It looks like the service has been more or less up for 10 days, which is reassuring from a disk space perspective.

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

QA Contact: lsblakk → hwine

Eric Ziegenhorn :ericz

Comment 4

•

12 years ago

The hard drive is full again on this box and it's mostly graphite data that's the problem.

Chris Cooper [:coop] (he/him)

Reporter

Comment 5

•

12 years ago

(In reply to Chris Lonnen :lonnen from comment #3) > 60 second resolution for 12 hours > 1 hour resolution for 10 days > 1 day resolution for 3 years :lonnen: can you tell which of these buckets is responsible and adjust accordingly? I don't think we need 60 second resolution at all, frankly.

Lonnen :lonnen

Comment 6

•

12 years ago

Coop: could you (or someone else from releng) recommend more sensible retention rates? If we don't need 60 seconds, we can drop that entirely for now.

Chris Cooper [:coop] (he/him)

Reporter

Comment 7

•

12 years ago

(In reply to Chris Lonnen :lonnen from comment #6) > Coop: could you (or someone else from releng) recommend more sensible > retention rates? If we don't need 60 seconds, we can drop that entirely for > now. I will poll the group and see what we want.

Ed Lim [:limed]

Comment 8

•

12 years ago

The disk has filled up again fyi [root@briarpatch-graphite-stage1.private.scl3 ~]# df -h /dev/sda3 Filesystem Size Used Avail Use% Mounted on /dev/sda3 194G 186G 0 100% /

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

Blocks: 786712

Chris Cooper [:coop] (he/him)

Reporter

Comment 9

•

12 years ago

[12:48pm] mburns: coop: ping re: https://bugzilla.mozilla.org/show_bug.cgi?id=775600 [12:51pm] coop: mburns: pong [12:52pm] mburns: Have you gotten a chance to poll the group on retention policies for briarpatch-graphite-stage1, or what older stuff can safely be rm'd until you can? [12:55pm] coop: mburns: we've decided to just shut it off [12:56pm] mburns: so want to decommission the whole server, or just turn it down for a while? [12:56pm] armenzg_lunch is now known as armenzg. [12:56pm] coop: we don't have anyone available to take over that work, and the data is not helping us at present [12:57pm] coop: we'll turn off the collectors for now, and will most like decommission the server soon

Hal Wine [:hwine] use NI!

Comment 10

•

12 years ago

per comment 9, we have no use for this disk. Please decommission

Assignee: nobody → server-ops-storage

Component: Release Engineering: Developer Tools → Server Operations: Storage

QA Contact: hwine → dparsons

Hal Wine [:hwine] use NI!

Comment 11

•

12 years ago

disk is attached to a VM, so decommissioning disk implies getting rid of the VM Please pass the VM info to :jhopkins and he'll make the final call.

Greg Cox [:gcox]

Assignee

Updated

•

12 years ago

Flags: needinfo?(jhopkins)

John Hopkins (:jhopkins)

Comment 12

•

12 years ago

Please proceed with deleting the briarpatch-graphite-stage1.private.scl3 VM.

Flags: needinfo?(jhopkins)

Greg Cox [:gcox]

Assignee

Updated

•

12 years ago

Assignee: server-ops-storage → gcox

Component: Server Operations: Storage → Server Operations: Virtualization

Greg Cox [:gcox]

Assignee

Comment 13

•

12 years ago

Powered off VM, in case there's screaming.

Severity: normal → minor

Summary: Disk on briarpatch-graphite-stage1 is full → Delete briarpatch-graphite-stage1.private.scl3

Greg Cox [:gcox]

Assignee

Comment 14

•

12 years ago

Removed from RHN and inventory. Pulled from DNS (change 65310) and puppet (change 65313). Deleted VM from disk.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → Infrastructure & Operations

Bugzilla

Delete briarpatch-graphite-stage1.private.scl3

Categories

(Infrastructure & Operations :: Virtualization, task)

Tracking

(Not tracked)

People

(Reporter: coop, Assigned: gcox)

References

Details

(Whiteboard: [briarpatch])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Updated

Comment 13

Comment 14

Updated