Closed Bug 634498 Opened 14 years ago Closed 8 years ago

setting up a data source for a variety of module analysis map reduce jobs.

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: chofmann, Unassigned)

Details

clone of the idea in https://bugzilla.mozilla.org/show_bug.cgi?id=634343#c7 it would be nice if we could run a number of jobs that run analysis over all the module lists of all crash reports. See: - bug 634343 for the need to find mismatched .dll versions that lead to crashes that result from incomplete installs dbaron's module correlations would be another (ratio's of the presence of a .dll in a 1 day sample of crashes for a signature v. ratio of reports in all signatures.) mapping of all suspected malware/unknown .dlls another, -- and stuff like in Bug 634097] Compare beta 10 hardware acceleration usage (the presents of d2d1.dll being loaded) to beta 11 hardware acceleration usage still another example. these all would benefit by setting up a map of crash_rpt/module data list pairs, and maybe a few pieces of other crash meta data, so additional map/reduce operations like those listed above could work efficiently. -chris "maybe read just enough about hadoop" to be dangerous" hofmann
Data to e.g. people and then in a later version to the UI? I feel like we ought to have somewhere better to put all these reports, since I think there are going to be more and more as time goes on, and some of them will be confidential. Jabba, any thoughts? This is what ted and I are talking about for bug 598098, too.
Is this related to bug 620146 ?
bug 620146 is about a place to do a variety of reporting experiments where we still have gaps in understanding crash data and making the best use of the data we have. I think this is more directly similar to Bug 594777, but for the kind of reporting we need here we would also have to .dll version info to the output suggested in bug 594777 comment 5. none of the data that would be in this particular output would be confidentical. Its basically an output that makes access to the existing "module info" for all reports easy to get at, search, and do sample counts and correlations.
data source as mentioned by laura in comment #2 will also help https://bugzilla.mozilla.org/show_bug.cgi?id=620180
reposting some comments made on irc as food for thought are three basic pieces of data that could be output daily from which many different reports can be derived. 1) "crash meta data" -- [report_id] signature, url, .. this is basically all the stuff in the .csv files [1] 2) "module list data" [report_id] module1, module2, module3, ... 3) stack data [report id] frame1, frame2, frame3.... from these 3 "maps" you could define just about all the custom reports that we are doing now, and many more interesting correlations and other reports that we need. building most report is just a matter of setting up a list of "interesting crash report" that are a subset of things like product,release,signature or other interesting combination pairs, then using that list of reports to do further reductions on module data, stack data [jesse's frame2 report, and/or stuff at http://people.mozilla.org/crash_stacks/stack-summary-4.0b11pre.txt ) or do additional correlations on other pieces of meta data. [1] meta data list 1 signature 2 url 3 uuid_url 4 client_crash_date 5 date_processed 6 last_crash 7 product 8 version 9 build 10 branch 11 os_name 12 os_version 13 cpu_info 14 address 15 bug_list 16 user_comments 17 uptime_seconds 18 email 19 adu_count 20 topmost_filenames 21 addons_checked 22 flash_version 23 hangid 24 reason 25 process_type 26 app_notes
I really wish we could make some progress on the alternate processed json format. Once we have that, we can do another POC for the ElasticSearch and most of these analysis tasks could easily be done directly in there.
(In reply to comment #7) > I really wish we could make some progress on the alternate processed json > format. Once we have that, we can do another POC for the ElasticSearch and > most of these analysis tasks could easily be done directly in there. Agreed. How do you guys feel about tackling that for 1.7.8?
(In reply to comment #8) > (In reply to comment #7) > > I really wish we could make some progress on the alternate processed json > > format. Once we have that, we can do another POC for the ElasticSearch and > > most of these analysis tasks could easily be done directly in there. > > Agreed. How do you guys feel about tackling that for 1.7.8? and are there bugs on file to define and track that work?
bug 573100 is already targeted at 1.7.8.
Component: Socorro → General
Product: Webtools → Socorro
we no longer have map reduce jobs
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.