Closed Bug 599352 Opened 14 years ago Closed 14 years ago

Hbase region of ooids starting with 8 is broken

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: laura, Assigned: dre)

Details

Attachments

(1 file)

full hdfs integers, row_key\thdfs_integer 14 years ago Anurag Phadke[:aphadke@mozilla.com] 35.18 KB, text/plain		Details

Laura Thomson :laura

Reporter

Description

•

14 years ago

      No description provided.

Laura Thomson :laura

Reporter

Comment 1

•

14 years ago

From Daniel's email:
"we discovered that all of the latest failures are for records that have keys beginning with "8".  You probably remember that our rowkeys are formatted as <first hex char of guid><date><guid> so that pointed at a bad region.

I ran the version of check_meta.rb that we have installed, and it did in fact discover a hole in the meta for that range. Unfortunately, it throws an exception when attempting to --fix the problem (listed below).  I checked several of the region ids listed below, and most of them are those old regions that we archived back in June.  The data is still sitting in the /hbase/crash_reports table and taking up space and (it appears) getting in the way of check_meta.rb.

At the point we are at now, it would be acceptable to just delete these old regions *if* we had a good way to figure out which is which and do so."

Assignee: nobody → deinspanjer

Priority: -- → P1

Anurag Phadke[:aphadke@mozilla.com]

Comment 2

•

14 years ago

Daniel - I looked at the meta_before_excise.txt file, what if we ran a MR job that deleted all keys with the the salt '<hex>0100610'?
Some of the keys might belong to the new regions but given we have the data archived on NFS, it'll at least buy us some room.....

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 3

•

14 years ago

The old keys are ones that start with 10, not ones with a salt char.
We don't have a backup of the data, we don't have an nfs with 10 TB of data anywhere.
Each reigon has a name that starts with tablename,startkey, but that is not how they are stored in hdfs, they are in hdfs as the "encoded name" which is an integer.  In that file, the integer is the one after the readable name.

Anurag Phadke[:aphadke@mozilla.com]

Comment 4

•

14 years ago

What if we parsed the file @ /home/deinspenjer/meta_before_excise.txt looking for the start and end rows with '1002*', i.e. find keys in old format and then grabbed the corresponding integer.... would that help in cleanup?

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 5

•

14 years ago

Yes, that is the track I was thinking of.  regions starting with crash_reports,100* get the encoded region name, delete those files in hdfs.

Anurag Phadke[:aphadke@mozilla.com]

Comment 6

•

14 years ago

alright, i am working on it, will update the ticket once its done, whats the impact if we accidently delete a wrong region?

Laura Thomson :laura

Reporter

Comment 7

•

14 years ago

(In reply to comment #6)
> alright, i am working on it, will update the ticket once its done, whats the
> impact if we accidently delete a wrong region?

Correct me if I'm wrong, but unrecoverable data loss?

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 8

•

14 years ago

Permanently lost data. Don't write something that does the delete, write something that can just output the hdfs paths we wish to delete.  Then, we can spot check and feed that to hadoop fs rm.

Anurag Phadke[:aphadke@mozilla.com]

Comment 9

•

14 years ago

yup. will only be printing the paths. no hdfs operations..

Anurag Phadke[:aphadke@mozilla.com]

Comment 10

•

14 years ago

Attached file full hdfs integers, row_key\thdfs_integer — Details

Anurag Phadke[:aphadke@mozilla.com]

Comment 11

•

14 years ago

sample values here, see comment attachment @ #10 for full set.

100227ac7538ac-c9cf-4fb0-babb-eeee22100227      1267328647632
1002279bdf3f1c-ee87-42b2-96d3-7845d2100227      1267328345569
100227c739deeb-daca-4ec5-a939-a67262100227      1267327858753
100227b74ccca5-d0a6-4252-9369-c240a2100227      1267331732018
100226b22f9efa-796c-49b1-8826-116662100226      1267253586726

Anurag Phadke[:aphadke@mozilla.com]

Comment 12

•

14 years ago

patrick angels (from cloudera) suggested we try renaming these regions instead of outright delete..... thoughts?

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 13

•

14 years ago

The region was brought back online without having to do anything with these extra regions.  That said, we should take a look at what to do with them in another bug.

Laura Thomson :laura

Reporter

Comment 14

•

14 years ago

Daniel, can I close this?

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 15

•

14 years ago

Yes. it is ready to be closed.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Hbase region of ooids starting with 8 is broken

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

People

(Reporter: laura, Assigned: dre)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Attachment

General

Description

File Name

Content Type