Last Comment Bug 458798 - Redesign and implement Socorro's file system storage
: Redesign and implement Socorro's file system storage
Product: Socorro
Classification: Server Software
Component: General (show other bugs)
: Trunk
: x86 Linux
-- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: socorro
Depends on: 462942
  Show dependency treegraph
Reported: 2008-10-06 14:08 PDT by K Lars Lohn [:lars] [:klohn]
Modified: 2011-12-28 10:40 PST (History)
2 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Description User image K Lars Lohn [:lars] [:klohn] 2008-10-06 14:08:41 PDT
Socorro's file system storage has flaws:  

* if monitor is offline for a while, the filesystem can be overwhelmed with new entries - make the file system storage tolerant of an outage of monitor

* the current system requires periodic cleanup by deleting old unused directories - make a system that is self cleaning

* the file system code is not modular, it's tied closely to collector - make a modular set of routines or a class that can be used in a uniform manner by multiple scripts

* the standard and deferred storage schemes are not parallel.  They use different code and different layouts - these should be unified
Comment 1 User image K Lars Lohn [:lars] [:klohn] 2008-10-31 14:46:19 PDT
staging deployment immanent...

Upgrading to the new file system overview:

1 - replace and start new collector
2 - wait for the jobs table to be empty before continuing
2.1 - while waiting, new dumps will accumulate in the new file system.  No worries about overfilling necessary
3 - stop monitor
4 - stop processor(s)
5 - replace monitor
6 - replace processor(s)
7 - start new processor(s)
8 - start new monitor
8.1 - there will temporarily be a higher than normal load while the processors catchup on the dumps that accumulated after step 2
9 - replace deferredcleanup cron job
10 - schedule new deferredcleanup cron job for daily runs
11 - a couple weeks later, manually cleanup legacy storage
11.1 - the new and old file system share the same roots.  Delete everything outside the 'name' and 'date' directories in those roots


collector changes
	replace socorro with the latest
	copy .../socorro/collector/ to
	new configuration parameters (see
	removed configuration parameters

monitor changes
	replace socorro with the latest
	new configuration parameters
		saveSuccessfulMinidumpsTo (see
		saveFailedMinidumpsTo (see
	removed configuration parameters

Processor changes:
	replace socorro with the latest
	removed configuration parameters
Comment 2 User image Aravind Gottipati [:aravind] 2008-11-03 14:57:42 PST
If this is ready to be pushed to staging, can you move this to server ops or open a new bug there?  I am not quite sure how you guys handle bugs like this.. (open new, vs. move)..
Comment 3 User image Michael Morgan [:morgamic] 2008-11-03 16:05:28 PST
We would normally file an IT bug for it and reference it here.  Lars will do so.
Comment 4 User image K Lars Lohn [:lars] [:klohn] 2008-11-03 16:26:47 PST
we now depend on 462942
Comment 5 User image K Lars Lohn [:lars] [:klohn] 2008-11-10 14:49:57 PST
We have pushed to production with the new file system. In my first test, my priority job was done in 40 seconds.  Try it and see how it works for you...
Comment 6 User image K Lars Lohn [:lars] [:klohn] 2008-11-23 13:54:31 PST
It appears to be working.

Note You need to log in before you can comment on or make changes to this bug.