The "old" primaryCrashStore directories on the prod collectors need to be removed (see bug 1079642). These directories contain a lot of files, so a plain rm -rf risks causing a non-trivial I/O hit (or just failing depending on just how large these structures are). Some ideas for different ways to handle this: * Normal rm -rf. * Write a script to crawl the tree and rm -rf leafs sequentially, with sleeps sprinkled throughout. * Use rsync to "replicate" an empty directory in-place; interesting because rsync is excellent at dealing with large filesystem structures. * Use the -delete option of GNU find; interesting for the same reasons as rsync. In any case we can do the nodes one at a time, draining them beforehand and adding them back when the removal is complete.
each day has about 64K directories in the radix tree. There are about nine months of daily directories. That totals to about 17,301,504 directories to be deleted on each collector.
Curious if we still need/want to do this given that we're migrating to S3. In other words, what are the chances that we'll run into an inode limit issue before we retire this infra?
it is unlikely that we'll have an inode problem. However, if something goes horribly wrong and the crashmovers are enable to move things on to S3 for several days, it is possible that we might run into inode problems. Take a look at the machines to see how close we are to hitting our heads on the inode ceiling. If we have three days of overhead, I'm not nervous. If we do not, then we ought to clean up so we have the saftey buffer.
All of the collectors have been cleaned. I did half of them with find method, and the other half with the rsync method - the latter was very slightly more efficient, as it turns out. Neat.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.