We can very easily iterate over the processed reports for a given day using the python service, or we can write a MapReduce job that will export the data in a tarball format. The day that we switch production to 1.7 would have half the data in one system and half in the other though, so it would be best to notify people that reports for that day will either be incomplete, or delayed by a day (so we can slurp in the partial data from NFS)
I wouldn't worry about losing a day's worth of reports at a switchover point. A possibly even better alternative would be to make the scripts run as a MapReduce job rather than taking a tarball as input... and depending on the performance of said MapReduce job, possibly just stop generating text files and let people run the queries when they want (which would probably be less work total).
I definitely want to take the report generation and turn it into a MapReduce job, but I'm currently thinking that will happen in the Socorro 1.8 timeframe rather than 1.7. Hence I'm trying to put a simple stopgap measure in place that will allow your existing scripts to run with very little change. That said, if you could point me at the scripts, we'll absolutely evaluate how much work it might take to do them in time for 1.7.
Oh, I think Aravind already pointed me at them. Is this they? http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/file/tip/per-crash-interesting-modules.py http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/file/tip/per-crash-core-count.py
Yes, but also see crashfinder.py.
If we defer this to 1.8, it means we won't have any correlation reports in the interim.
Assignee: nobody → deinspanjer
Target Milestone: 1.8 → 1.7
Version: 1.8 → 1.7
Xavier, see http://people.mozilla.com/crash_analysis/ for the destination of the files. Please ping laura on #breakpad for education on how they are used.
(In reply to comment #6) Once this work is complete, please apply to previously missed days, 6/11 and later.
Not quite ready for full use in production yet. We are working on a "salted" TableInputFormat for HBase to reduce load on the cluster when running MR jobs. Other than that though the code does work and aravind has a cron ready to run once we green light it. I have run the Hadoop version of the correlation reports though for the missing days (6/11 - 6/15 and 6/17).
Code has been checked in to Socorro and documentation has been put on the Socorro wiki. This includes the "salted" reader (a.k.a. MultiScanTableInputFormat).
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.