bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

setting up a data source for a variety of module analysis map reduce jobs.

RESOLVED WORKSFORME

Status

Socorro
General
RESOLVED WORKSFORME
8 years ago
11 months ago

People

(Reporter: chris hofmann, Unassigned)

Tracking

Trunk
x86
Mac OS X

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
clone of the idea in https://bugzilla.mozilla.org/show_bug.cgi?id=634343#c7

it would be nice if we could run a number of jobs that run analysis over all 
the module lists of all crash reports.  

See:

- bug 634343 for the need to find mismatched .dll versions that lead to crashes that result from incomplete installs

dbaron's module correlations would be another (ratio's of the presence of a .dll in a 1 day sample of crashes for a signature v. ratio of reports in all signatures.)

mapping of all suspected malware/unknown .dlls another, 

-- and stuff like in Bug 634097] Compare beta 10 hardware acceleration usage (the presents of d2d1.dll being loaded)  to beta 11 hardware acceleration usage still another example.

these all would benefit by setting up a map of crash_rpt/module data list pairs, and maybe a few pieces of other crash meta data, so additional map/reduce operations like those listed above could work efficiently.

-chris "maybe read just enough about hadoop" to be dangerous" hofmann
(Reporter)

Comment 1

8 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=630201#c19 is also similar to the request in bug 634097

Comment 2

8 years ago
Data to e.g. people and then in a later version to the UI?

I feel like we ought to have somewhere better to put all these reports, since I think there are going to be more and more as time goes on, and some of them will be confidential.  Jabba, any thoughts?

This is what ted and I are talking about for bug 598098, too.

Comment 3

8 years ago
Is this related to bug 620146 ?
(Reporter)

Comment 4

8 years ago
 bug 620146 is about a place to do a variety of reporting experiments where we still have gaps in understanding crash data and making the best use of the data we have.

I think this is more directly similar to Bug 594777, but for the kind of reporting we need here we would also have to .dll version info to the output suggested in bug 594777 comment 5.

none of the data that would be in this particular output would be confidentical.  Its basically an output that makes access to the existing "module info" for all reports easy to get at, search, and do sample counts and correlations.
No longer blocks: 620180
data source as mentioned by laura in comment #2 will also help https://bugzilla.mozilla.org/show_bug.cgi?id=620180
(Reporter)

Comment 6

8 years ago
reposting some comments made on irc as food for thought

are three basic pieces of data that could be output daily from which many different reports can be derived.  

1) "crash meta data" -- [report_id] signature, url, 
                       .. this is basically all the stuff in the .csv files [1]

2) "module list data"   [report_id] module1, module2, module3, ...

3)  stack data          [report id] frame1, frame2, frame3....


from these 3 "maps" you could define just about all the custom reports that we are doing now, and many more interesting correlations and other reports that we need.

building most report is just a matter of setting up a list of "interesting crash report" that are a subset of things like product,release,signature or other interesting combination pairs, then using that list of reports to do further reductions on module data, stack data [jesse's frame2 report, and/or stuff at http://people.mozilla.org/crash_stacks/stack-summary-4.0b11pre.txt )
or do additional correlations on other pieces of meta data.

[1]
meta data list
1 signature
2 url
3 uuid_url
4 client_crash_date
5 date_processed
6 last_crash
7 product
8 version
9 build
10 branch
11 os_name
12 os_version
13 cpu_info
14 address
15 bug_list
16 user_comments
17 uptime_seconds
18 email
19 adu_count
20 topmost_filenames
21 addons_checked
22 flash_version
23 hangid
24 reason
25 process_type
26 app_notes
I really wish we could make some progress on the alternate processed json format.  Once we have that, we can do another POC for the ElasticSearch and most of these analysis tasks could easily be done directly in there.

Comment 8

8 years ago
(In reply to comment #7)
> I really wish we could make some progress on the alternate processed json
> format.  Once we have that, we can do another POC for the ElasticSearch and
> most of these analysis tasks could easily be done directly in there.

Agreed.  How do you guys feel about tackling that for 1.7.8?
(Reporter)

Comment 9

8 years ago
(In reply to comment #8)
> (In reply to comment #7)
> > I really wish we could make some progress on the alternate processed json
> > format.  Once we have that, we can do another POC for the ElasticSearch and
> > most of these analysis tasks could easily be done directly in there.
> 
> Agreed.  How do you guys feel about tackling that for 1.7.8?

and are there bugs on file to define and track that work?
bug 573100 is already targeted at 1.7.8.
(Assignee)

Updated

7 years ago
Component: Socorro → General
Product: Webtools → Socorro

Comment 11

11 months ago
we no longer have map reduce jobs
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.