Socorro's file system storage has flaws: * if monitor is offline for a while, the filesystem can be overwhelmed with new entries - make the file system storage tolerant of an outage of monitor * the current system requires periodic cleanup by deleting old unused directories - make a system that is self cleaning * the file system code is not modular, it's tied closely to collector - make a modular set of routines or a class that can be used in a uniform manner by multiple scripts * the standard and deferred storage schemes are not parallel. They use different code and different layouts - these should be unified
staging deployment immanent... Upgrading to the new file system overview: 1 - replace and start new collector 2 - wait for the jobs table to be empty before continuing 2.1 - while waiting, new dumps will accumulate in the new file system. No worries about overfilling necessary 3 - stop monitor 4 - stop processor(s) 5 - replace monitor 6 - replace processor(s) 7 - start new processor(s) 8 - start new monitor 8.1 - there will temporarily be a higher than normal load while the processors catchup on the dumps that accumulated after step 2 9 - replace deferredcleanup cron job 10 - schedule new deferredcleanup cron job for daily runs 11 - a couple weeks later, manually cleanup legacy storage 11.1 - the new and old file system share the same roots. Delete everything outside the 'name' and 'date' directories in those roots Details: collector changes replace socorro with the latest copy .../socorro/collector/modpython-collecor.py to collector.py new configuration parameters (see http://code.google.com/p/socorro/wiki/SocorroCollector#logFilePathname) logFilePathname logFileMaximumSize logFileMaximumBackupHistory logFileLineFormatString logFileErrorLoggingLevel removed configuration parameters reporterURL monitor changes replace socorro with the latest new configuration parameters saveSuccessfulMinidumpsTo (see http://code.google.com/p/socorro/wiki/SocorroMonitor#saveSuccessfulMinidumpsTo) saveFailedMinidumpsTo (see http://code.google.com/p/socorro/wiki/SocorroMonitor#saveFailedMinidumpsTo) removed configuration parameters dumpDirDelta dateDirDelta cleanupDirectoryLoopDelay saveMinidumpsTo saveFailedMinidumps saveProcessedMinidumps debug Processor changes: replace socorro with the latest removed configuration parameters debug
If this is ready to be pushed to staging, can you move this to server ops or open a new bug there? I am not quite sure how you guys handle bugs like this.. (open new, vs. move)..
We would normally file an IT bug for it and reference it here. Lars will do so.
we now depend on 462942
We have pushed to production with the new file system. In my first test, my priority job was done in 40 seconds. Try it and see how it works for you...
It appears to be working.