Closed Bug 1079642 Opened 10 years ago Closed 10 years ago

Prod Collectors using wrong filesystem storage class

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: lars, Assigned: dmaher)

References

Details

Attachments

(1 file)

updated FS type 10 years ago Daniel Maher [:phrawzty] 1.06 KB, patch		Details \| Diff \| Splinter Review

K Lars Lohn [:lars] [:klohn]

Reporter

Description

•

10 years ago

It appears that on October 1 of 2013, all the Production collectors got reverted to using the old style Crash Storage classes that don't cleanup after themselves. 

They need to be migrated back to FSTemporaryStorage.

K Lars Lohn [:lars] [:klohn]

Reporter

Comment 1

•

10 years ago

I've mitigated the problem for three months on C1, C2, C3 with:

    sudo rm -rf ~socorro/primaryCrashStore/2013????

I'll hit the other collectors tomorrow when these are done.

K Lars Lohn [:lars] [:klohn]

Reporter

Comment 2

•

10 years ago

all six prod collectors have now had 90 days of old empty directories purged.  We will have to clear out all of it eventually, but now we've got a 90 day buffer before we encounter the problem again.  

Each evening when the load is low, I'm going to continue this process of clearing out old directories.

Lonnen :lonnen

Comment 3

•

10 years ago

New plan -- disable puppet on all, remove half of the collectors from zeus, let them drain, upgrade them, return them to zeus, then repeat the last 4 steps for the other half and re-enable puppet on all.

It's a little more manually intensive and it involves touching zeus. It also cuts our capacity in half, but we can handle 3x normal peak load so if we do it off peak hours it should be fine. If phrawzty can help with zeus we can do it early PST and everything will work out.

There's no rush; we can do it when Phrawzty gets back next week iff he has permissions to modify zeus still. If not, we can schedule with webops.

K Lars Lohn [:lars] [:klohn]

Reporter

Comment 4

•

10 years ago

the steps of the plan (see https://etherpad.mozilla.org/Atkhh07tOK for discussion and plan alternatives)

1) Disable puppet on all
2) Update svn config with the new configuration
2.1) in both 'collector.ini'  and 'crashmover.ini' replace   socorro.external.fs.crashstorage.FSLegacyDatedRadixTreeStorage'  with    'socorro.external.fs.crashstorage.FSTemporaryStorage' 
3) Remove half of the collectors from zeus, let them drain
4) after complete drainage
4.1) mv $SOCORRO_HOME/primaryCrashStore $SOCORRO_HOME/retired/primaryCrashStore
4.2) mkdir $SOCORRO_HOME/primaryCrashStore
4.3) chown -R apache:socorro primary_crash_store
4.4) chmod -R g+ws primary_crash_store
4.5) chmod -R o+rx primary_crash_store
5) Run puppet manually on the removed set
5.1) restart Apache & crashmovers
5.2) verify that logged current config is correct
6) Return them to Zeus
7) Watch for trouble
8) repeat 3-7 with the other set of collectors
9) Enable puppet on all

Daniel Maher [:phrawzty]

Assignee

Comment 5

•

10 years ago

(In reply to Chris Lonnen :lonnen from comment #3)
> There's no rush; we can do it when Phrawzty gets back next week iff he has
> permissions to modify zeus still. If not, we can schedule with webops.

It would appear that I can still log into the Zeus admin panel, so that's good.

Daniel Maher [:phrawzty]

Assignee

Comment 6

•

10 years ago

This manipulation is currently scheduled for Wednesday, 22 October 2014, 07:00:00 UTC [1].

[1] http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=10&day=22&hour=7&min=0&sec=0&p1=195&p2=179&p3=224

Assignee: nobody → dmaher

Severity: normal → major

Daniel Maher [:phrawzty]

Assignee

Comment 7

•

10 years ago

Attached patch updated FS type — Details — Splinter Review

Daniel Maher [:phrawzty]

Assignee

Comment 8

•

10 years ago

$ svn ci -m 'update FS type; bug 1079642'
Sending        collector.ini
Sending        crashmover.ini
Transmitting file data ..
Committed revision 95255.

Status: NEW → ASSIGNED

Daniel Maher [:phrawzty]

Assignee

Comment 9

•

10 years ago

The manipulation[1] is complete.


[1] https://etherpad.mozilla.org/ep/pad/view/ro.DcUxGDURaBv/rev.1629

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Daniel Maher [:phrawzty]

Assignee

Updated

•

10 years ago

Blocks: 1087311

Daniel Maher [:phrawzty]

Assignee

Updated

•

10 years ago

Blocks: 1087414

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Prod Collectors using wrong filesystem storage class

Categories

(Socorro :: Infra, task)

Tracking

(Not tracked)

People

(Reporter: lars, Assigned: dmaher)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Attachment

General

Description

File Name

Content Type