Need more cluster friendly mini-dump stack walker

RESOLVED FIXED in 1.5

Status

Socorro
General
--
major
RESOLVED FIXED
9 years ago
6 years ago

People

(Reporter: dre, Assigned: ted)

Tracking

({perf})

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

I believe what would work best for us is if we could get a new
version/replacement of mini dump stack walk that had a function in it that
received the JSON and binary dump data as two input arguments and returned the
new JSON string as an out. 


Then, we could easily wrap that with the Pipes API and have a processor that
wouldn't have to fork out a new process for every single crash report being
processed. 

This API uses sockets to handle communication between the Hadoop Java layer and
a C++ component.  The component can take over as many of the subsystems of a
MapReduce job as is needed and the rest will remain in the main Hadoop Java
world.  It can do the Map, the Reduce, and any of the utility functions like
reading records. 

Would this be something that we could prototype in time for me to be able to
use it in my proof of concept system for this quarter?  I can probably hack in
the Pipes API without too much trouble as long as I had the program with the
function as described above. 

Here are a few links to information about the Pipes API: 
http://developer.yahoo.com/hadoop/tutorial/module4.html#pipes 

The Pipes "hello world" application: 
    SVN form:
http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/examples/pipes/ 
        A sample program that implements both Map and Reduce (We'd only need
the Map for our purposes though): 
       
http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/examples/pipes/impl/wordcount-simple.cc 
    Wiki form: http://wiki.apache.org/hadoop/C%2B%2BWordCount
(Assignee)

Comment 1

9 years ago
You're talking about getting it done by the end of December? That seems like a pretty tight timeframe. I'm not sure I'll have time to get something for you in that short of a time.

Comment 2

9 years ago
given daniel says this is a day or two of work, think we are talking about in the next week or two.  It's blocking the work on getting crash dumps processed and stored in hadoop - which is pretty important.  can we get this bumped in priority?
(Assignee)

Comment 3

9 years ago
This is not a day or two worth of work in my estimation. I don't think it's a huge task, but it's going to take a little bit of time, and with the all hands next week, and only one full week after that until Christmas, there isn't a lot of time left in the quarter. I have an e10s task lined up as well as a bunch of crashkill-related work.
(Assignee)

Comment 4

9 years ago
I may have come off a bit harsh there! I do think this is a great idea, and I'd love to do it, I just don't think I'll have time in Q4 with all the other demands on my time.
Requirements have shifted around a bit on the edges, but the core of this request is still sound.  We need a version of minidump stackwalk that is friendly to being invoked from the cluster.
The latest strategy is to have a listener daemon wrapper around it that accepts requests to it with a stream of bytes representing the crash dump file and have it return the machine readable stack output.  This daemon can easily be invoked from not only our cluster, but also our priority processor and the existing Socorro infrastructure (which could potentially benefit from not having to fork out a new instance of the program on every crash report...)

Ted has a working prototype of this code so I'm assigning the bug to him so he can attach or link to it here for Socorro 1.5 tracking purposes.

Note that this daemon is separate from the "symbol server" tracked in bug 526512 which we still need and which will work in conjunction with this.
Assignee: nobody → ted.mielczarek
Target Milestone: --- → 1.5
(Assignee)

Comment 6

8 years ago
My implementation is here:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/stackwalk-net

I just added a README, and the repo contains a sample client in Python.

I'm going to call this fixed, we can certainly reopen or file new bugs if there are other issues to deal with.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Blocks: 542855

Comment 7

8 years ago
Hey Ted,

Can I get a port command line option added(maybe -p)?
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.