Closed
Bug 599352
Opened 14 years ago
Closed 14 years ago
Hbase region of ooids starting with 8 is broken
Categories
(Socorro :: General, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: laura, Assigned: dre)
Details
Attachments
(1 file)
35.18 KB,
text/plain
|
Details |
No description provided.
Reporter | ||
Comment 1•14 years ago
|
||
From Daniel's email: "we discovered that all of the latest failures are for records that have keys beginning with "8". You probably remember that our rowkeys are formatted as <first hex char of guid><date><guid> so that pointed at a bad region. I ran the version of check_meta.rb that we have installed, and it did in fact discover a hole in the meta for that range. Unfortunately, it throws an exception when attempting to --fix the problem (listed below). I checked several of the region ids listed below, and most of them are those old regions that we archived back in June. The data is still sitting in the /hbase/crash_reports table and taking up space and (it appears) getting in the way of check_meta.rb. At the point we are at now, it would be acceptable to just delete these old regions *if* we had a good way to figure out which is which and do so."
Assignee: nobody → deinspanjer
Priority: -- → P1
Comment 2•14 years ago
|
||
Daniel - I looked at the meta_before_excise.txt file, what if we ran a MR job that deleted all keys with the the salt '<hex>0100610'? Some of the keys might belong to the new regions but given we have the data archived on NFS, it'll at least buy us some room.....
Assignee | ||
Comment 3•14 years ago
|
||
The old keys are ones that start with 10, not ones with a salt char. We don't have a backup of the data, we don't have an nfs with 10 TB of data anywhere. Each reigon has a name that starts with tablename,startkey, but that is not how they are stored in hdfs, they are in hdfs as the "encoded name" which is an integer. In that file, the integer is the one after the readable name.
Comment 4•14 years ago
|
||
What if we parsed the file @ /home/deinspenjer/meta_before_excise.txt looking for the start and end rows with '1002*', i.e. find keys in old format and then grabbed the corresponding integer.... would that help in cleanup?
Assignee | ||
Comment 5•14 years ago
|
||
Yes, that is the track I was thinking of. regions starting with crash_reports,100* get the encoded region name, delete those files in hdfs.
Comment 6•14 years ago
|
||
alright, i am working on it, will update the ticket once its done, whats the impact if we accidently delete a wrong region?
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #6) > alright, i am working on it, will update the ticket once its done, whats the > impact if we accidently delete a wrong region? Correct me if I'm wrong, but unrecoverable data loss?
Assignee | ||
Comment 8•14 years ago
|
||
Permanently lost data. Don't write something that does the delete, write something that can just output the hdfs paths we wish to delete. Then, we can spot check and feed that to hadoop fs rm.
Comment 9•14 years ago
|
||
yup. will only be printing the paths. no hdfs operations..
Comment 10•14 years ago
|
||
Comment 11•14 years ago
|
||
sample values here, see comment attachment @ #10 for full set. 100227ac7538ac-c9cf-4fb0-babb-eeee22100227 1267328647632 1002279bdf3f1c-ee87-42b2-96d3-7845d2100227 1267328345569 100227c739deeb-daca-4ec5-a939-a67262100227 1267327858753 100227b74ccca5-d0a6-4252-9369-c240a2100227 1267331732018 100226b22f9efa-796c-49b1-8826-116662100226 1267253586726
Comment 12•14 years ago
|
||
patrick angels (from cloudera) suggested we try renaming these regions instead of outright delete..... thoughts?
Assignee | ||
Comment 13•14 years ago
|
||
The region was brought back online without having to do anything with these extra regions. That said, we should take a look at what to do with them in another bug.
Reporter | ||
Comment 14•14 years ago
|
||
Daniel, can I close this?
Assignee | ||
Comment 15•14 years ago
|
||
Yes. it is ready to be closed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•