Closed Bug 626328 Opened 15 years ago Closed 15 years ago

Socorro - collectors can fail to send to disk

Categories

(Socorro :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Assigned: lars)

Details

Attachments

(1 file)

something was different in tonight's downtime for hbase. We discovered that the collectors were not sending things to the file system when hbase is down. the fallback system was designed to fallback to the filesystem when hbase times out or fails. It doesn't respond so well when hbase just isn't even there in the first place. We need to know how this down episode of hbase was different than previous times. The collector is failing in initialization when hbase isn't there. It never even gets to the point of trying to accept a crash. I patched the code as a temporary solution, and the patch works. But there should be a better solution to this problem engineered.
Assignee: nobody → lars
Target Milestone: --- → 1.7.6
How does it fail? How was HBase differently absent than usual? Can you attach your patch to the bug?
The CollectorCrashStorageSytemForHBase class relies on having an hbaseConnection to do its work. It cannot deal with and send things to fallback storage if that connection isn't even there. The solution is to never let the HBaseClient constructor fail, NoConnection exceptions must be dealt with within the constructor and not allowed to propagate outward. By allowing the constructor to complete with a bad connection to HBase, we allow the reconnection mechanisms in further method calls to do their work.
Attachment #504844 - Flags: review?(rhelmer)
Attachment #504844 - Flags: review?(rhelmer) → review+
This is fixed in release 1.7.5.6 and is currently in production and verified to work. The fix has been ported forward to 1.7.6.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #3) > This is fixed in release 1.7.5.6 and is currently in production and verified to > work. The fix has been ported forward to 1.7.6. Lars and I tested the trunk (1.7.6) version on staging, too.
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: