Closed Bug 542855 Opened 15 years ago Closed 15 years ago

Need Socorro crash report processor python code to be refactored into a reusable library

Categories

(Socorro :: General, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dre, Assigned: lars)

References

Details

(Keywords: perf)

Attachments

(1 file)

Per bug 532226, we now have a version of minidump stackwalk that can be started up once and listen for requests to process crash reports. This is critical for the Hadoop cluster, but it could also potentially increase our crash report processing throughput with the existing Socorro infrastructure since it won't have to fork out a new process for every crash report. I'd like to get the Socorro python code refactored a bit to make a set of simple methods that take care of the different parts of crash report processing (meta data processing, walker invocation, walker output processing). Hopefully we can refactor the existing code to use these new APIs so that we'll have good code coverage for as long as the existing system is still in use.
We might consider requiring Python 2.6.4 for future versions Ted has returned a new version of the processor that accepts byte streams over a socket. In attempting to refactor the socorro code to use this, it looks like Python 2.6.4 has a very convenient method on sockets that will return a file-like object. This would allow changing over to the new system to be accomplished with a very small code change. However, given that all of this is rather intimately linked to the code in line for refactoring which might need many changes, I'm also willing to scrap it and figure out a different solution (i.e. just read all the lines into memory and make sure nothing tries to do any strange caching things). Any thoughts?
(In reply to comment #1) Working with IT to see if we can get them to package and support Python 2.6.4.
I really don't want to derail pushing this change by making it depending on upgrading to Python 2.6 unless we don't have any viable alternative. Could someone else possibly take a look at the code from bug 532226 and see if there is a reasonable way to refactor the existing file based call to minidump stackwalk without having to use the 2.6 socket makefile method?
I'm tempted to just read everything into a list of lines and pull out all the caching stuff the were wrapping the file with. I don't think the memory usage would be much worse as I think I found a place where it caches the whole file anyway.
The current method which analyzes the processed minidump frames calls self.framesTable.insert(databaseCursor, (reportId, frame_num, date_processed, thisFramesSignature[:255]), self.databaseConnectionPool.connectToDatabase, date_processed=date_processed) to insert a row for each frame (if there are less than 10 frames). In refactoring this into a library, I'm going to just return a list of (reportId, frame_num, date_processed, thisFramesSignature[:255]) and one can do whatever one wants to do with it. That said, what do we want to do with it in hbase? We could just serialize it in the json, but I'm not sure what we use it for later.
Blocks: 464775
Blocks: 439679
Here is a first pass of the library code. It currently talks to a stackwalk server for the minidump processing, but does not talk to hbase. I have not yet integrated into the processor module though I should have a first pass tomorrow. It should be much easier to do this if we also stop writing to sql at the same moment we start using the library. Otherwise we need a function that goes through the result object and writes everything that was previously written in a bunch of different far-flung functions. processEverything(uuid,jsonPath,dumpPath,host,port) gives a pretty good rundown of how the component functions should be used, though it lacks hbase integration (should be very trivial). I'm also not convinced that this is the right way to deal with the config options. All criticism and advice is encouraged.
I have a lot of comments about this code but I'm not in a position to type them all. We need to set up a phone meeting so that we can discuss it. I am having medical procedures today, so this proposed phone meeting can't take place until tomorrow (Friday).
Going to put this code on hold for the time being until Socorro resources free up for a thorough design plan.
-> 1.6; focusing on hbase and report collection first.
Target Milestone: 1.5 → 1.6
Assignee: nobody → lars
Target Milestone: 1.6 → 1.7
This is changing definition, and will be in 1.8
Target Milestone: 1.7 → 1.8
this is done, though not yet integrated into the trunk
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: