538206 - Add code to crash-report web heads to emit raw crash data to Hadoop cluster in addition to NFS

Reporter

Description

•

15 years ago

We can't keep up with processing if we hammer the NFS server by scanning directories. The best way we can solve this is by having the web heads write the crash report to the Hadoop cluster as soon as it is received. For the immediate future, this could be done in addition to the existing storage in the NFS. Eventually, when we are 100% integrated, we could eventually have this be the only storage method. We have a fairly simple Python script that uses the Thrift API to interact with the cluster. I'll attach this as sample code to what would need to be done on your side. We'd like to get this first integration point in as soon as we can. Ideally by the end of January. That way we can continue testing and development without further impact to your existing infrastructure.

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 1

•

15 years ago

Attached file Sample code demonstrating use of HBase Thrift API in Python — Details

Extract and run python crashreports.py --help

Frank Griswold [:griswolf] [:fgriswold]

Updated

•

15 years ago

Target Milestone: 1.3 → 1.4

Michael Morgan [:morgamic]

Updated

•

15 years ago

Assignee: nobody → lars

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 2

•

15 years ago

questions: 1) are the 'connections' based on HBaseConnection long lived and reusable? Are they mulithread safe? My instinct is to cache one per thread and reuse within that thread rather than pooling them and sharing between threads. Is establishing a connection a fast or slow process? 2) it looks like the 'create_ooid' methods are used to create new entries?

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 3

•

15 years ago

btw, the generated code Hbase.py is an utter abomination and makes the infant forms of all world's great prophets cry.

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 4

•

15 years ago

I asked around a bit and have heard that lots of people will connect process disconnect all in the context of a single web request so I imagine that it should be light weight enough to do that. I haven't heard anything about thread safety so I'll do a bit more research there. Yes, create_ooid and create_ooid_from_files are the two methods we've provided for you to create new entries. Poorly named, I'm sorry. :/ Feel free to rename all the methods in this thing to names that are consistent with your code. Nothing has to be locked down in the API yet.

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 5

•

15 years ago

I've a bifurcated version of the collector ready for testing...

Sample code demonstrating use of HBase Thrift API in Python 15 years ago Daniel Einspanjer [:dre] [:deinspanjer] 144.89 KB, application/zip		Details
first CSV performance test numbers 15 years ago Austin King [:ozten] 77.66 KB, application/csv		Details
First attempt a transmitter to compliment the Collector 15 years ago Austin King [:ozten] 15.99 KB, patch	griswolf : review-	Details \| Diff \| Splinter Review
Second attempt at transmitter 15 years ago Austin King [:ozten] 43.52 KB, patch		Details \| Diff \| Splinter Review
socorro/transmitter/hbaseClient.py 15 years ago Austin King [:ozten] 5.42 KB, text/plain		Details