634498 - setting up a data source for a variety of module analysis map reduce jobs.

Reporter

Description

•

14 years ago

clone of the idea in https://bugzilla.mozilla.org/show_bug.cgi?id=634343#c7 it would be nice if we could run a number of jobs that run analysis over all the module lists of all crash reports. See: - bug 634343 for the need to find mismatched .dll versions that lead to crashes that result from incomplete installs dbaron's module correlations would be another (ratio's of the presence of a .dll in a 1 day sample of crashes for a signature v. ratio of reports in all signatures.) mapping of all suspected malware/unknown .dlls another, -- and stuff like in Bug 634097] Compare beta 10 hardware acceleration usage (the presents of d2d1.dll being loaded) to beta 11 hardware acceleration usage still another example. these all would benefit by setting up a map of crash_rpt/module data list pairs, and maybe a few pieces of other crash meta data, so additional map/reduce operations like those listed above could work efficiently. -chris "maybe read just enough about hadoop" to be dangerous" hofmann

chris hofmann

Reporter

Comment 1

•

14 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=630201#c19 is also similar to the request in bug 634097

Laura Thomson :laura

Comment 2

•

14 years ago

Data to e.g. people and then in a later version to the UI? I feel like we ought to have somewhere better to put all these reports, since I think there are going to be more and more as time goes on, and some of them will be confidential. Jabba, any thoughts? This is what ted and I are talking about for bug 598098, too.

Justin Dow [:jabba]

Comment 3

•

14 years ago

Is this related to bug 620146 ?

chris hofmann

Reporter

Comment 4

•

14 years ago

bug 620146 is about a place to do a variety of reporting experiments where we still have gaps in understanding crash data and making the best use of the data we have. I think this is more directly similar to Bug 594777, but for the kind of reporting we need here we would also have to .dll version info to the output suggested in bug 594777 comment 5. none of the data that would be in this particular output would be confidentical. Its basically an output that makes access to the existing "module info" for all reports easy to get at, search, and do sample counts and correlations.

Anurag Phadke[:aphadke@mozilla.com]

Comment 5

•

14 years ago

data source as mentioned by laura in comment #2 will also help https://bugzilla.mozilla.org/show_bug.cgi?id=620180

chris hofmann

Reporter

Comment 6

•

14 years ago

reposting some comments made on irc as food for thought are three basic pieces of data that could be output daily from which many different reports can be derived. 1) "crash meta data" -- [report_id] signature, url, .. this is basically all the stuff in the .csv files [1] 2) "module list data" [report_id] module1, module2, module3, ... 3) stack data [report id] frame1, frame2, frame3.... from these 3 "maps" you could define just about all the custom reports that we are doing now, and many more interesting correlations and other reports that we need. building most report is just a matter of setting up a list of "interesting crash report" that are a subset of things like product,release,signature or other interesting combination pairs, then using that list of reports to do further reductions on module data, stack data [jesse's frame2 report, and/or stuff at http://people.mozilla.org/crash_stacks/stack-summary-4.0b11pre.txt ) or do additional correlations on other pieces of meta data. [1] meta data list 1 signature 2 url 3 uuid_url 4 client_crash_date 5 date_processed 6 last_crash 7 product 8 version 9 build 10 branch 11 os_name 12 os_version 13 cpu_info 14 address 15 bug_list 16 user_comments 17 uptime_seconds 18 email 19 adu_count 20 topmost_filenames 21 addons_checked 22 flash_version 23 hangid 24 reason 25 process_type 26 app_notes

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 7

•

14 years ago

I really wish we could make some progress on the alternate processed json format. Once we have that, we can do another POC for the ElasticSearch and most of these analysis tasks could easily be done directly in there.

Laura Thomson :laura

Comment 8

•

14 years ago

(In reply to comment #7) > I really wish we could make some progress on the alternate processed json > format. Once we have that, we can do another POC for the ElasticSearch and > most of these analysis tasks could easily be done directly in there. Agreed. How do you guys feel about tackling that for 1.7.8?

chris hofmann

Reporter

Comment 9

•

14 years ago

(In reply to comment #8) > (In reply to comment #7) > > I really wish we could make some progress on the alternate processed json > > format. Once we have that, we can do another POC for the ElasticSearch and > > most of these analysis tasks could easily be done directly in there. > > Agreed. How do you guys feel about tackling that for 1.7.8? and are there bugs on file to define and track that work?

(not currently active) Ted Mielczarek

Comment 10

•

14 years ago

bug 573100 is already targeted at 1.7.8.

Nobody; OK to take it and work on it

Assignee

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Lonnen :lonnen

Comment 11

•

8 years ago

we no longer have map reduce jobs

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WORKSFORME

Bugzilla

setting up a data source for a variety of module analysis map reduce jobs.

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: chofmann, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11