On 2010-04-08 there collector stalled out because hbase calls were taking too long. In 1.7, the hbase becomes the primary storage and we cannot afford such a failure. With timeouts, we can fallback to local storage quickly. The collector will not bog down, and we won't lose crashes like we did. Can the timeouts be implemented in the thrift layer?
Priority: -- → P1
Target Milestone: --- → 1.7
Anurag and I will hit up people on the #hbase channel to see what we can get done with this.
The python version of hbaseClient seems to be having an infinite timeout. I would assume/hope that adding a socket timeout will resolve this issue. adding following line below line #69 a.k.a. transport = self.tsocketModule.TSocket(self.host, self.port) should do the trick: line to be added: transport.setTimeout(1000) #in ms I have checked out the code from: http://code.google.com/p/socorro/source/browse/trunk/socorro/hbase/ and made the change, but not sure how to test it locally Daniel - Can you let me know how to test it (mainly config details, path etc.) once you have some time to spare?
I'll work with you tomorrow morning to get something set up on khan so we can try to test it.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.