Last Comment Bug 458798 - Redesign and implement Socorro's file system storage
: Redesign and implement Socorro's file system storage
Status: RESOLVED FIXED
:
Product: Socorro
Classification: Server Software
Component: General (show other bugs)
: Trunk
: x86 Linux
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: socorro
:
Mentors:
Depends on: 462942
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-06 14:08 PDT by K Lars Lohn [:lars] [:klohn]
Modified: 2011-12-28 10:40 PST (History)
2 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description K Lars Lohn [:lars] [:klohn] 2008-10-06 14:08:41 PDT
Socorro's file system storage has flaws:  

* if monitor is offline for a while, the filesystem can be overwhelmed with new entries - make the file system storage tolerant of an outage of monitor

* the current system requires periodic cleanup by deleting old unused directories - make a system that is self cleaning

* the file system code is not modular, it's tied closely to collector - make a modular set of routines or a class that can be used in a uniform manner by multiple scripts

* the standard and deferred storage schemes are not parallel.  They use different code and different layouts - these should be unified
Comment 1 K Lars Lohn [:lars] [:klohn] 2008-10-31 14:46:19 PDT
staging deployment immanent...

Upgrading to the new file system overview:

1 - replace and start new collector
2 - wait for the jobs table to be empty before continuing
2.1 - while waiting, new dumps will accumulate in the new file system.  No worries about overfilling necessary
3 - stop monitor
4 - stop processor(s)
5 - replace monitor
6 - replace processor(s)
7 - start new processor(s)
8 - start new monitor
8.1 - there will temporarily be a higher than normal load while the processors catchup on the dumps that accumulated after step 2
9 - replace deferredcleanup cron job
10 - schedule new deferredcleanup cron job for daily runs
11 - a couple weeks later, manually cleanup legacy storage
11.1 - the new and old file system share the same roots.  Delete everything outside the 'name' and 'date' directories in those roots

Details:

collector changes
	replace socorro with the latest
	copy .../socorro/collector/modpython-collecor.py to collector.py
	new configuration parameters (see http://code.google.com/p/socorro/wiki/SocorroCollector#logFilePathname)
		logFilePathname
		logFileMaximumSize
		logFileMaximumBackupHistory
		logFileLineFormatString
		logFileErrorLoggingLevel
	removed configuration parameters
		reporterURL

monitor changes
	replace socorro with the latest
	new configuration parameters
		saveSuccessfulMinidumpsTo (see http://code.google.com/p/socorro/wiki/SocorroMonitor#saveSuccessfulMinidumpsTo)
		saveFailedMinidumpsTo (see http://code.google.com/p/socorro/wiki/SocorroMonitor#saveFailedMinidumpsTo)
	removed configuration parameters
		dumpDirDelta
		dateDirDelta
		cleanupDirectoryLoopDelay
		saveMinidumpsTo
		saveFailedMinidumps
		saveProcessedMinidumps
		debug

Processor changes:
	replace socorro with the latest
	removed configuration parameters
		debug
Comment 2 Aravind Gottipati [:aravind] 2008-11-03 14:57:42 PST
If this is ready to be pushed to staging, can you move this to server ops or open a new bug there?  I am not quite sure how you guys handle bugs like this.. (open new, vs. move)..
Comment 3 Michael Morgan [:morgamic] 2008-11-03 16:05:28 PST
We would normally file an IT bug for it and reference it here.  Lars will do so.
Comment 4 K Lars Lohn [:lars] [:klohn] 2008-11-03 16:26:47 PST
we now depend on 462942
Comment 5 K Lars Lohn [:lars] [:klohn] 2008-11-10 14:49:57 PST
We have pushed to production with the new file system. In my first test, my priority job was done in 40 seconds.  Try it and see how it works for you...
Comment 6 K Lars Lohn [:lars] [:klohn] 2008-11-23 13:54:31 PST
It appears to be working.

Note You need to log in before you can comment on or make changes to this bug.