Closed
Bug 566340
Opened 14 years ago
Closed 13 years ago
Need Map Reduce job to clean up pre-1.7 data
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dre, Assigned: tmary)
Details
Currently the rowkeys are not salted. We need to have a job that fixes the old data so it can be retrieved via the 1.7 codebase.
Reporter | ||
Updated•14 years ago
|
Severity: normal → critical
Reporter | ||
Comment 1•14 years ago
|
||
Also, we'll want to draw out the new attributes such as ids:ooid and timestamps:*. One big MR that is finished before 2.0 is the best way to go for this.
Summary: Need Map Reduce job to fix rowkeys on crash_reports table → Need Map Reduce job to clean up pre-1.7 data
Version: 1.7 → 2.0
Comment 2•14 years ago
|
||
https://wiki.mozilla.org/Socorro_hbase_cleanup The wiki page contains details on all the required changes that need to happen before 1.8 release.
Updated•14 years ago
|
Assignee: aphadke → deinspanjer
Comment 3•13 years ago
|
||
Daniel, did this get done? If not, is it still relevant? I'm guessing it's no longer critical.
Severity: critical → normal
Reporter | ||
Comment 4•13 years ago
|
||
Has not been done yet, still needs to be done, not critical at the moment.
Assignee: deinspanjer → xstevens
Comment 5•13 years ago
|
||
Not really trying to argue against this, but aren't we also talking about removing data that is over 6 months old? Because if that was the case we might as well delete these as part of that process rather than update them.
Reporter | ||
Comment 6•13 years ago
|
||
The pre-1.7 data isn't inside hbase at all, it is all the region directories that have no corresponding row in .META. whatever our TTL strategy is won't affect them.
Comment 7•13 years ago
|
||
I don't think we'll need a MapReduce job for this at all. I've written some code that will give us all of the region directories in HDFS that .META. doesn't know about.
Comment 8•13 years ago
|
||
HBase actually now detects this automatically. I moved the directories out of the way on secondary and everything seems to work fine.
Reporter | ||
Comment 9•13 years ago
|
||
Cool. How much disk space does the offline data represent? If possible, we should try to pull it out of HDFS and onto tape so we can archive it until we are given clearance to delete entirely.
Comment 10•13 years ago
|
||
6.8TB unreplicated
Reporter | ||
Comment 11•13 years ago
|
||
could you file an IT bug and cc laura asking IT if they have any ideas for how we could archive this to tape?
Updated•13 years ago
|
Assignee: xstevens → tmeyarivan
Assignee | ||
Comment 12•13 years ago
|
||
Existing archive has been deleted after socorro-dev verified that it is not needed. --
Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•