Closed Bug 554373 Opened 15 years ago Closed 15 years ago

Correlation Reports API

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ozten, Assigned: xstevens)

References

Details

(Whiteboard: Implemented feedback and now testing speed with production dataset)

Currently dbaron's correlation reports are hosted on people as giant text files. His code should be migrated to hbase/hadoop and a report or API should be created to make accessing this data easier and more robust.
Whiteboard: cloud
Blocks: 554374
Assignee: nobody → xstevens
What is the status update?
I've been working out some bugs and trying to improve performance.
I've got a working implementation. It does the counting with Hadoop MapReduce and then reads that output with a python script for formatting. I checked the python code into crash-data-tools project where dbaron's original code lives. I checked the MapReduce code into moco/metrics/hadoop/crash-reports. I don't really want things to live in two separate projects so I'm looking for feedback if anyone has any.
You could put the python into the Socorro repo, but that doesn't solve the two repos problem. (Would be socorro + metrics).
What does the API for getting these correlations out look like? The current prod integration is a hack. We'll want a clean API for: For each type of correlation report 1) Accessing correlations by prod/version/os/signature 2) Access multiple correlations by prod/version/os/list_of_signatures
If we can limit retrieval of correlation data to *always* requiring at least a prod/version/os/ prefix, then we can set up a correlations table with a rowkey of prod/version/os/signature. That would allow an API to retrieve the correlation data for one specific signature, or even to scan all signatures for a given prod/version/os/ prefix. Does this sound useful? More importantly, can you think of cases where this wouldn't work?
Summary: Correlation reports should be generated in hadoop report system → Correlation Reports API
Whiteboard: cloud → Basic design has been started
I created a MR job to count all of the product/version/os/sigs for a given day. Then these numbers can create a correlation report via the REST API like so. http://cm-hadoop01:8080/correlation-report/report/20100701/Thunderbird/3.0/Windows%20NT/JS_CallTracer%7CEXCEPTION_ACCESS_VIOLATION
Whiteboard: Basic design has been started → Implementation is nearing completion - would love to get feedback
(In reply to comment #7) Great work! This API is very similar. We can ship this and change the semantics of what is in the correlation report, or tweak this API to match the prod data. There is also one additional API in use against dbaron's correlation reports (flat files). Details: Correlations are placed on the following screens in production: /report/list /report/index/{uuid] /topcrasher/{Product}/{Version} Existing Page: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=UserCallWinProcCheckWow&version=Firefox%3A3.6.8#modver Snippet Modules EXCEPTION_ACCESS_VIOLATION (12411) 97% (12031/12411) vs. 61% (98974/162039) shdocvw.dll 52% (6469/12411) vs. 19% (30824/162039) msvcr71.dll 41% (5054/12411) vs. 9% (13901/162039) nppdf32.dll 89% (11035/12411) vs. 58% (94239/162039) samlib.dll 99% (12229/12411) vs. 68% (110292/162039) nssckbi.dll API output "interesting-modules":[ { "module":"oleaut32.dll", "sigCount":2, "totalSigCount":184, "sigPercent":1.0869565, "osCount":2, "totalOsCount":184, "osPercent":1.0869565 From our first example if we break down '97% (12031/12411) vs. 61% (98974/162039) shdocvw.dll' into the new API's output variables: module = shdocvw.dll sigCount = 12031 totalSigCount = 12411 sigPercent = 97% A group of properties missing are the "all crashes that match ignoring the crash signature". overallCount = 98974 totalCount = 162039 overallPercent = 61% osCount, totalOsCount, and osPercent don't exist and are new properties. They look fine. I'm not sure about the name osCount, etc. Another issue is that we have a bulk version of this API which does not take a signature. It would be something like http://cm-hadoop01:8080/correlation-report/report/{Day}/{Product}/{Version}/{OS Name} The results have a list of signatures and the correlations have only the highest correlation. To see this API in action, check out http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.8 Look in the Correlation column and click on one with data.
It would be nice if we split signature in the output into signature crash_reason This way the UI doesn't have to split on '|'.
Austin, Just to be clear osCount, totalOsCount, and osPercent are calculated in the same way they are currently. Those represent the "all crashes that match ignoring the crash signature". I can rename those to overallCount, totalCount, and overallPercent. I've split out crash_reason from signature in the return value.
Okay, perfect. I thought they were different (based on IRC conversation).
Firefox numbers are always easier for me to look at: http://cm-hadoop01:8080/correlation-report/report/20100701/Firefox/3.6.4/Windows%20NT/hang%20%7C%20KiFastSystemCallRet%7CEXCEPTION_BREAKPOINT Again this is staging so don't expect to compare these to production numbers for this day just yet.
(In reply to comment #12) Just confirming there is no specific question for me here.
Nope. I'm working on some of the changes you suggested including adding top crashers.
Whiteboard: Implementation is nearing completion - would love to get feedback → Implemented feedback and now testing speed with production dataset
This functionality is now complete, but we will need to code review, document, etc. probably before we deploy.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.