Closed
Bug 1079642
Opened 10 years ago
Closed 10 years ago
Prod Collectors using wrong filesystem storage class
Categories
(Socorro :: Infra, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lars, Assigned: dmaher)
References
Details
Attachments
(1 file)
1.06 KB,
patch
|
Details | Diff | Splinter Review |
It appears that on October 1 of 2013, all the Production collectors got reverted to using the old style Crash Storage classes that don't cleanup after themselves. They need to be migrated back to FSTemporaryStorage.
Reporter | ||
Comment 1•10 years ago
|
||
I've mitigated the problem for three months on C1, C2, C3 with: sudo rm -rf ~socorro/primaryCrashStore/2013???? I'll hit the other collectors tomorrow when these are done.
Reporter | ||
Comment 2•10 years ago
|
||
all six prod collectors have now had 90 days of old empty directories purged. We will have to clear out all of it eventually, but now we've got a 90 day buffer before we encounter the problem again. Each evening when the load is low, I'm going to continue this process of clearing out old directories.
Comment 3•10 years ago
|
||
New plan -- disable puppet on all, remove half of the collectors from zeus, let them drain, upgrade them, return them to zeus, then repeat the last 4 steps for the other half and re-enable puppet on all. It's a little more manually intensive and it involves touching zeus. It also cuts our capacity in half, but we can handle 3x normal peak load so if we do it off peak hours it should be fine. If phrawzty can help with zeus we can do it early PST and everything will work out. There's no rush; we can do it when Phrawzty gets back next week iff he has permissions to modify zeus still. If not, we can schedule with webops.
Reporter | ||
Comment 4•10 years ago
|
||
the steps of the plan (see https://etherpad.mozilla.org/Atkhh07tOK for discussion and plan alternatives) 1) Disable puppet on all 2) Update svn config with the new configuration 2.1) in both 'collector.ini' and 'crashmover.ini' replace socorro.external.fs.crashstorage.FSLegacyDatedRadixTreeStorage' with 'socorro.external.fs.crashstorage.FSTemporaryStorage' 3) Remove half of the collectors from zeus, let them drain 4) after complete drainage 4.1) mv $SOCORRO_HOME/primaryCrashStore $SOCORRO_HOME/retired/primaryCrashStore 4.2) mkdir $SOCORRO_HOME/primaryCrashStore 4.3) chown -R apache:socorro primary_crash_store 4.4) chmod -R g+ws primary_crash_store 4.5) chmod -R o+rx primary_crash_store 5) Run puppet manually on the removed set 5.1) restart Apache & crashmovers 5.2) verify that logged current config is correct 6) Return them to Zeus 7) Watch for trouble 8) repeat 3-7 with the other set of collectors 9) Enable puppet on all
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #3) > There's no rush; we can do it when Phrawzty gets back next week iff he has > permissions to modify zeus still. If not, we can schedule with webops. It would appear that I can still log into the Zeus admin panel, so that's good.
Assignee | ||
Comment 6•10 years ago
|
||
This manipulation is currently scheduled for Wednesday, 22 October 2014, 07:00:00 UTC [1]. [1] http://www.timeanddate.com/worldclock/meetingdetails.html?year=2014&month=10&day=22&hour=7&min=0&sec=0&p1=195&p2=179&p3=224
Assignee: nobody → dmaher
Severity: normal → major
Assignee | ||
Comment 7•10 years ago
|
||
Assignee | ||
Comment 8•10 years ago
|
||
$ svn ci -m 'update FS type; bug 1079642' Sending collector.ini Sending crashmover.ini Transmitting file data .. Committed revision 95255.
Status: NEW → ASSIGNED
Assignee | ||
Comment 9•10 years ago
|
||
The manipulation[1] is complete. [1] https://etherpad.mozilla.org/ep/pad/view/ro.DcUxGDURaBv/rev.1629
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•