Socorro Collector needs timeout on hbase calls

RESOLVED FIXED in 1.7

Status

Socorro
General
P1
normal
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: lars, Assigned: dre)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
On 2010-04-08 there collector stalled out because hbase calls were taking too long.  In 1.7, the hbase becomes the primary storage and we cannot afford such a failure.

With timeouts, we can fallback to local storage quickly.  The collector will not bog down, and we won't lose crashes like we did.

Can the timeouts be implemented in the thrift layer?
(Reporter)

Updated

8 years ago
Priority: -- → P1
Target Milestone: --- → 1.7

Updated

8 years ago
Assignee: nobody → deinspanjer
Anurag and I will hit up people on the #hbase channel to see what we can get done with this.
The python version of hbaseClient seems to be having an infinite timeout. I would assume/hope that adding a socket timeout will resolve this issue.

adding following line below line #69 a.k.a.         
transport = self.tsocketModule.TSocket(self.host, self.port)
should do the trick:

line to be added:
transport.setTimeout(1000) #in ms

I have checked out the code from:
http://code.google.com/p/socorro/source/browse/trunk/socorro/hbase/ and made the change, but not sure how to test it locally

Daniel - Can you let me know how to test it (mainly config details, path etc.) once you have some time to spare?
I'll work with you tomorrow morning to get something set up on khan so we can try to test it.
(Reporter)

Updated

8 years ago
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.