Closed
Bug 798837
Opened 12 years ago
Closed 11 years ago
populate postgresql with output of correlation report job
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 875990
People
(Reporter: rhelmer, Assigned: rhelmer)
References
Details
Attachments
(4 obsolete files)
Per bug 798153 comment 1, we need a stopgap until the new way of generating reports is ready (bug 650904). We need a daily cron job which fetches the correlation reports (example: https://crash-analysis.mozilla.com/crash_analysis/20121006/20121006_Firefox_17.0a2-core-counts.txt), parses it and inserts the result into postgres. The old app does this from the UI code, which is the primary reason that it frequently times out, and is very slow when it does work. https://github.com/mozilla/socorro/blob/master/webapp-php/application/controllers/correlation.php https://github.com/mozilla/socorro/blob/master/webapp-php/application/libraries/Correlation.php
Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Component: Webapp → Backend
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → rhelmer
Assignee | ||
Comment 1•12 years ago
|
||
Actually it would be more sensible to just modify the existing correlations job (the one that generates these reports) to drop a CSV, and import that into postgres, instead of writing a parser for it's (human-readable) custom output format. I'll investigate that and morph the bug if it looks easy to do.
Assignee | ||
Updated•12 years ago
|
Summary: daily cronjob to parse correlation report, populate postgresql → populate postgresql with output of correlation report job
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 3•12 years ago
|
||
We could integrate this better into Socorro instead of bolting this onto the side, but for the moment I want to be able to get correlation reports into the DB without having to parse the current output files (and I don't want to disturb the current process).
Attachment #669032 -
Attachment is obsolete: true
Attachment #669039 -
Flags: review?(dbaron)
Attachment #669039 -
Flags: feedback?(chris.lonnen)
Assignee | ||
Comment 4•12 years ago
|
||
Here is more info on this process, for anyone uninitiated. The code lives outside of Socorro: http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ The script that drives this here on the servers (lives in IT's puppet repo not in Socorro proper): /data/bin/cron_libraries.sh The cronjob pulls some crash report IDs out of postgres and then uses hbaseClient.py to pull the corresponding processed JSON, and dumps it to files such as: /tmp/Firefox_18.0a1.tar If you pull that file ^ from prod, you could test the various modes like this: python per-crash-core-count.py -p Firefox -r 18.0a1 -f Firefox_18.0a1.tar -d Firefox_18.0a1-core-counts.csv > Firefox_18.0a1-core-counts.txt python per-crash-interesting-modules.py -v -p Firefox -r 18.0a1 -f Firefox_18.0a1.tar -d Firefox_18.0a1-interesting-modules-with-versions.csv > Firefox_18.0a1-interesting-modules-with-versions.txt The latter also has "-a" (addons), and can generate reports with/without versions. For our purposes, I think we only need these three: * core-counts * addons (with versions) * modules (with versions) If I can get those three loaded into raw tables, I believe I can do everything we need to reproduce the (currently somewhat slow/broken) UI.
Assignee | ||
Comment 5•12 years ago
|
||
* store signature and reason in separate columns * use " not ' for consistency with existing code
Attachment #669039 -
Attachment is obsolete: true
Attachment #669039 -
Flags: review?(dbaron)
Attachment #669039 -
Flags: feedback?(chris.lonnen)
Attachment #669192 -
Flags: review?(dbaron)
Attachment #669192 -
Flags: feedback?(chris.lonnen)
Comment on attachment 669192 [details] [diff] [review] add optional CSV output to correlation reports In both files: > signame = signame + "|" + crash["reason"] > signature = osys["signatures"].setdefault(signame, > { "count": 0, >- "core_counts": {} }) >+ "core_counts": {}, >+ "sig": crash["signature"], >+ "reason": crash["reason"] }) you want this "sig" to include all of the munging that happened to signame prior to the line at the top of the quote (i.e., the removal of numeric stuff). Otherwise it's not true for all of the crashes described. Easy to fix by introducing a new variable (say, sig). >+ headers = ["product", "version", "os_name", "signature", "reason", >+ "total_signature_count", "total_os_count", "sig_count", >+ "sig_ratio", "os_count", "os_ratio", "family", "core_count"] >diff -r 7cdfea7cdb3b per-crash-interesting-modules.py >+ headers = ("product", "version", "os_name", "signature", >+ "total_signature_count", "total_os_count", "libname", >+ "in_signature_count", "in_signature_ratio", >+ "in_signature_versions", "in_signature_versions_ratio", >+ "in_os_count", "in_os_ratio", "in_os_versions", >+ "in_os_versions_ratio", "version") These header names could perhaps be a bit more consistent between the two files (e.g., sig_count and sig_ratio vs. in_signature_count and in_signature_ratio). But they could also be named a bit more clearly (introducing some differences), though I'm not sure how to keep that short: total_signature_count -> os_crashes_with_signature total_os_count -> os_crashes sig_count -> os_crashes_with_signature_and_core_count sig_ratio -> core_count_portion_for_signature_and_os os_count -> os_crashes_with_core_count os_ratio -> core_count_portion_for_os >+ >+ if options.csv_filename: >+ writer.writerow([options.product, options.release, >+ osname, sig["sig"], sig["reason"], >+ sig["count"], osys["count"], >+ module["libname"].encode("UTF-8"), >+ module["in_sig_count"], >+ int(round(module["in_sig_ratio"] * >+ 100)), >+ sig_ver_count, >+ int(round(float(sig_ver_count) / >+ sig["count"] * 100)), >+ module["in_os_count"], >+ int(round(module["in_os_ratio"] * >+ 100)), >+ os_ver_count, >+ int(round(float(os_ver_count) / >+ osys["count"] * 100)), >+ version]) >+ >+ else: >+ if options.csv_filename: >+ writer.writerow([options.product, options.release, >+ osname, sig["sig"], sig["reason"], >+ sig["count"], osys["count"], >+ module["libname"].encode("UTF-8"), >+ module["in_sig_count"], >+ int(round(module["in_sig_ratio"] * >+ 100)), >+ None, >+ None, >+ module["in_os_count"], >+ int(round(module["in_os_ratio"] * >+ 100)), >+ None, >+ None, >+ onlyver]) > print > print This logic doesn't seem right when show_versions is off (and I'm not convinced it's right when show_versions is on). I tend to think that for the CSV form you don't want to branch based on whether len(module["in_os_versions"]) == 1, since that's a formatting thing rather than relevant to the data. r=dbaron with those things fixed (though I'm surprised you're still using this code)
Attachment #669192 -
Flags: review?(dbaron) → review+
Assignee | ||
Comment 7•12 years ago
|
||
Sorry for the churn :( * make column names more consistent across reports * do not repeat column names (this matters when we import to db)
Attachment #669192 -
Attachment is obsolete: true
Attachment #669192 -
Flags: feedback?(chris.lonnen)
Attachment #669218 -
Flags: review?(dbaron)
Attachment #669218 -
Flags: feedback?(chris.lonnen)
Assignee | ||
Updated•12 years ago
|
Attachment #669218 -
Attachment is obsolete: true
Attachment #669218 -
Flags: review?(dbaron)
Attachment #669218 -
Flags: feedback?(chris.lonnen)
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to David Baron [:dbaron] (recovering from illness; hopefully back Oct 8, but with backlog) from comment #6) > Comment on attachment 669192 [details] [diff] [review] Thanks for the review, will fix this up. > r=dbaron with those things fixed (though I'm surprised you're still using > this code) You and me both :)
Assignee | ||
Comment 9•11 years ago
|
||
We're probably going to leave the old system alone and redo this in bug 875990.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•