Closed Bug 566340 Opened 14 years ago Closed 13 years ago

Need Map Reduce job to clean up pre-1.7 data

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dre, Assigned: tmary)

Details

Currently the rowkeys are not salted.  We need to have a job that fixes the old data so it can be retrieved via the 1.7 codebase.
Severity: normal → critical
Also, we'll want to draw out the new attributes such as ids:ooid and timestamps:*.
One big MR that is finished before 2.0 is the best way to go for this.
Summary: Need Map Reduce job to fix rowkeys on crash_reports table → Need Map Reduce job to clean up pre-1.7 data
Version: 1.7 → 2.0
https://wiki.mozilla.org/Socorro_hbase_cleanup

The wiki page contains details on all the required changes that need to happen before 1.8 release.
Assignee: aphadke → deinspanjer
Daniel, did this get done?

If not, is it still relevant?  I'm guessing it's no longer critical.
Severity: critical → normal
Has not been done yet, still needs to be done, not critical at the moment.
Assignee: deinspanjer → xstevens
Not really trying to argue against this, but aren't we also talking about removing data that is over 6 months old? Because if that was the case we might as well delete these as part of that process rather than update them.
The pre-1.7 data isn't inside hbase at all, it is all the region directories that have no corresponding row in .META. whatever our TTL strategy is won't affect them.
I don't think we'll need a MapReduce job for this at all. I've written some code that will give us all of the region directories in HDFS that .META. doesn't know about.
HBase actually now detects this automatically. I moved the directories out of the way on secondary and everything seems to work fine.
Cool.  How much disk space does the offline data represent?  If possible, we should try to pull it out of HDFS and onto tape so we can archive it until we are given clearance to delete entirely.
6.8TB unreplicated
could you file an IT bug and cc laura asking IT if they have any ideas for how we could archive this to tape?
Assignee: xstevens → tmeyarivan
Existing archive has been deleted after socorro-dev verified that it is not needed.

--
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.