Closed
Bug 594777
Opened 14 years ago
Closed 14 years ago
implement a map/reduce job to produce a list of modules from a day's worth of crash reports
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ted, Assigned: xstevens)
References
Details
Currently, to fill in missing symbols for Windows crash reports, I have a script that hits the Socorro ATOM feed for the 500 most recent Windows crashes, and looks at their module lists to find missing symbols.
This isn't great, since 500 crashes a day is a pretty small sample, and as bug 575817 indicates, we're still missing a lot of symbols. I'd like to instead use a map/reduce job to provide the input, since we should be able to run against a much larger sample set (like the entire day's worth of crashes).
I'm happy to help write the map/reduce job here, although I don't know the first thing about Hadoop. I also don't know exactly how we'd make the output available to my script for further processing. Just make it available via HTTP somehow?
Reporter | ||
Comment 1•14 years ago
|
||
The logic would be something like:
for every crash in the set:
if this is not a Windows crash, skip it
otherwise, take all lines starting with Module| from the raw dump, split them on '|', and insert fields 1,3,4 (zero-indexed) as a row in the result set
for reducing, remove duplicate rows to get only unique rows in the output.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → xstevens
Assignee | ||
Comment 2•14 years ago
|
||
Ted-,
I have something like this written already. I'll modify it and clean it up a bit for you.
Assignee | ||
Comment 3•14 years ago
|
||
Do you want comma-delimited output or a different delimiter? What about entries that have blank fields 3 and 4? Other than these minor issues I've got a job that we can run. We'll probably want to wrap a shell script around it to get the results off of hadoop and put them somewhere. It'll just be plain text files so you can do what you want with them from there.
Reporter | ||
Comment 4•14 years ago
|
||
CSV is fine. Entries with blank fields 3 and 4 can be dropped from the output. Thanks!
Assignee | ||
Comment 5•14 years ago
|
||
Hey Ted,
So here is some sample output:
11.2.9117.0.nmcorePS.dll,nmcorePS.pdb,73387E65FD5D4F3A9B7CC306CBAF3CA41
1445070.dll,FirefoxExt35.pdb,1F888C3FB0FC426D9384F92307B382501
228078g07.dll,DGJR.pdb,A40D46D92CA54937A8520538E0A17D773
3gppttrenderer.dll,3gppttrenderer.pdb,780BC6C76FD04AC49E916DD05D78A8705
813250m16t.dll,MWDL.pdb,449BF30EF5BB4500A4696E80237C51DD2
821109m16t.dll,MWDL.pdb,449BF30EF5BB4500A4696E80237C51DD2
836312m16t.dll,MWDL.pdb,449BF30EF5BB4500A4696E80237C51DD2
ACE.dll,ACE.pdb,B477FA3428A740B7A0C625CB561F7A1C1
ACE.dll,ACE.pdb,FAD341203BFF471B84128EFF1FBF0F561
ACTIVEDS.DLL,activeds.pdb,3B7DE0562
If this looks good to you I can check this in and we can deploy this out somewhere so it can be executed on a daily basis.
Reporter | ||
Comment 6•14 years ago
|
||
This looks good. Looking at the output, though, I realize that you could drop column 1, since I don't actually need it. It looks like there are some DLLs there that change their name but not the other info (probably spyware/viruses), so it should reduce the size of the output.
Comment 7•14 years ago
|
||
I could use column 1 for some stuff I'm doing, so if possible lets keep it.
Reporter | ||
Comment 8•14 years ago
|
||
It's not a big deal either way for me, just something I realized after seeing the duplicate data in the output in comment 5.
Assignee | ||
Comment 9•14 years ago
|
||
I'll leave in the dll then. 9/12/2010 on production yielded a list of 47,366 entries. Just going to include my series of commands here for recording purposes. I can document this on socorro wiki or something if we want to later.
hadoop jar socorro-analysis-job.jar com.mozilla.socorro.hadoop.CrashReportModuleList -Dproduct.filter="Firefox" -Dos.filter="Windows NT" -Dstart.date=20100912 -Dend.date=20100912 module-list-out
hadoop fs -getmerge module-list-out modulelist.txt
sort modulelist.txt -o modulelist.sorted
Assignee | ||
Comment 10•14 years ago
|
||
Going to set this to fixed. Will work with Laura and team for deployment.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•14 years ago
|
||
Okay. If you file followup bug(s) on deployment, please make them block bug 575817.
Comment 12•14 years ago
|
||
I forgot about this bug; I guess this is similar to the request in Bug 634498. good to see it coming on line soon.
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•