Closed Bug 798837 Opened 13 years ago Closed 12 years ago

populate postgresql with output of correlation report job

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 875990

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

Attachments

(4 obsolete files)

quick PoC, needs more testing 13 years ago Robert Helmer [:rhelmer] 5.38 KB, patch		Details \| Diff \| Splinter Review
add optional CSV output to correlation reports 13 years ago Robert Helmer [:rhelmer] 5.44 KB, patch		Details \| Diff \| Splinter Review
add optional CSV output to correlation reports 13 years ago Robert Helmer [:rhelmer] 6.61 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
add optional CSV output to correlation reports 13 years ago Robert Helmer [:rhelmer] 6.66 KB, patch		Details \| Diff \| Splinter Review

Robert Helmer [:rhelmer]

Assignee

Description

•

13 years ago

Per bug 798153 comment 1, we need a stopgap until the new way of generating reports is ready (bug 650904). We need a daily cron job which fetches the correlation reports (example: https://crash-analysis.mozilla.com/crash_analysis/20121006/20121006_Firefox_17.0a2-core-counts.txt), parses it and inserts the result into postgres. The old app does this from the UI code, which is the primary reason that it frequently times out, and is very slow when it does work. https://github.com/mozilla/socorro/blob/master/webapp-php/application/controllers/correlation.php https://github.com/mozilla/socorro/blob/master/webapp-php/application/libraries/Correlation.php

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Status: NEW → ASSIGNED

Component: Webapp → Backend

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Assignee: nobody → rhelmer

Robert Helmer [:rhelmer]

Assignee

Comment 1

•

13 years ago

Actually it would be more sensible to just modify the existing correlations job (the one that generates these reports) to drop a CSV, and import that into postgres, instead of writing a parser for it's (human-readable) custom output format. I'll investigate that and morph the bug if it looks easy to do.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Summary: daily cronjob to parse correlation report, populate postgresql → populate postgresql with output of correlation report job

Robert Helmer [:rhelmer]

Assignee

Comment 2

•

13 years ago

Attached patch quick PoC, needs more testing (obsolete) — Details — Splinter Review

Robert Helmer [:rhelmer]

Assignee

Comment 3

•

13 years ago

Attached patch add optional CSV output to correlation reports (obsolete) — Details — Splinter Review

We could integrate this better into Socorro instead of bolting this onto the side, but for the moment I want to be able to get correlation reports into the DB without having to parse the current output files (and I don't want to disturb the current process).

Attachment #669032 - Attachment is obsolete: true

Attachment #669039 - Flags: review?(dbaron)

Attachment #669039 - Flags: feedback?(chris.lonnen)

Robert Helmer [:rhelmer]

Assignee

Comment 4

•

13 years ago

Here is more info on this process, for anyone uninitiated. The code lives outside of Socorro: http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ The script that drives this here on the servers (lives in IT's puppet repo not in Socorro proper): /data/bin/cron_libraries.sh The cronjob pulls some crash report IDs out of postgres and then uses hbaseClient.py to pull the corresponding processed JSON, and dumps it to files such as: /tmp/Firefox_18.0a1.tar If you pull that file ^ from prod, you could test the various modes like this: python per-crash-core-count.py -p Firefox -r 18.0a1 -f Firefox_18.0a1.tar -d Firefox_18.0a1-core-counts.csv > Firefox_18.0a1-core-counts.txt python per-crash-interesting-modules.py -v -p Firefox -r 18.0a1 -f Firefox_18.0a1.tar -d Firefox_18.0a1-interesting-modules-with-versions.csv > Firefox_18.0a1-interesting-modules-with-versions.txt The latter also has "-a" (addons), and can generate reports with/without versions. For our purposes, I think we only need these three: * core-counts * addons (with versions) * modules (with versions) If I can get those three loaded into raw tables, I believe I can do everything we need to reproduce the (currently somewhat slow/broken) UI.

Robert Helmer [:rhelmer]

Assignee

Comment 5

•

13 years ago

Attached patch add optional CSV output to correlation reports (obsolete) — Details — Splinter Review

* store signature and reason in separate columns * use " not ' for consistency with existing code

Attachment #669039 - Attachment is obsolete: true

Attachment #669039 - Flags: review?(dbaron)

Attachment #669039 - Flags: feedback?(chris.lonnen)

Attachment #669192 - Flags: review?(dbaron)

Attachment #669192 - Flags: feedback?(chris.lonnen)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 6

•

13 years ago

Comment on attachment 669192 [details] [diff] [review] add optional CSV output to correlation reports In both files: > signame = signame + "|" + crash["reason"] > signature = osys["signatures"].setdefault(signame, > { "count": 0, >- "core_counts": {} }) >+ "core_counts": {}, >+ "sig": crash["signature"], >+ "reason": crash["reason"] }) you want this "sig" to include all of the munging that happened to signame prior to the line at the top of the quote (i.e., the removal of numeric stuff). Otherwise it's not true for all of the crashes described. Easy to fix by introducing a new variable (say, sig). >+ headers = ["product", "version", "os_name", "signature", "reason", >+ "total_signature_count", "total_os_count", "sig_count", >+ "sig_ratio", "os_count", "os_ratio", "family", "core_count"] >diff -r 7cdfea7cdb3b per-crash-interesting-modules.py >+ headers = ("product", "version", "os_name", "signature", >+ "total_signature_count", "total_os_count", "libname", >+ "in_signature_count", "in_signature_ratio", >+ "in_signature_versions", "in_signature_versions_ratio", >+ "in_os_count", "in_os_ratio", "in_os_versions", >+ "in_os_versions_ratio", "version") These header names could perhaps be a bit more consistent between the two files (e.g., sig_count and sig_ratio vs. in_signature_count and in_signature_ratio). But they could also be named a bit more clearly (introducing some differences), though I'm not sure how to keep that short: total_signature_count -> os_crashes_with_signature total_os_count -> os_crashes sig_count -> os_crashes_with_signature_and_core_count sig_ratio -> core_count_portion_for_signature_and_os os_count -> os_crashes_with_core_count os_ratio -> core_count_portion_for_os >+ >+ if options.csv_filename: >+ writer.writerow([options.product, options.release, >+ osname, sig["sig"], sig["reason"], >+ sig["count"], osys["count"], >+ module["libname"].encode("UTF-8"), >+ module["in_sig_count"], >+ int(round(module["in_sig_ratio"] * >+ 100)), >+ sig_ver_count, >+ int(round(float(sig_ver_count) / >+ sig["count"] * 100)), >+ module["in_os_count"], >+ int(round(module["in_os_ratio"] * >+ 100)), >+ os_ver_count, >+ int(round(float(os_ver_count) / >+ osys["count"] * 100)), >+ version]) >+ >+ else: >+ if options.csv_filename: >+ writer.writerow([options.product, options.release, >+ osname, sig["sig"], sig["reason"], >+ sig["count"], osys["count"], >+ module["libname"].encode("UTF-8"), >+ module["in_sig_count"], >+ int(round(module["in_sig_ratio"] * >+ 100)), >+ None, >+ None, >+ module["in_os_count"], >+ int(round(module["in_os_ratio"] * >+ 100)), >+ None, >+ None, >+ onlyver]) > print > print This logic doesn't seem right when show_versions is off (and I'm not convinced it's right when show_versions is on). I tend to think that for the CSV form you don't want to branch based on whether len(module["in_os_versions"]) == 1, since that's a formatting thing rather than relevant to the data. r=dbaron with those things fixed (though I'm surprised you're still using this code)

Attachment #669192 - Flags: review?(dbaron) → review+

Robert Helmer [:rhelmer]

Assignee

Comment 7

•

13 years ago

Attached patch add optional CSV output to correlation reports (obsolete) — Details — Splinter Review

Sorry for the churn :( * make column names more consistent across reports * do not repeat column names (this matters when we import to db)

Attachment #669192 - Attachment is obsolete: true

Attachment #669192 - Flags: feedback?(chris.lonnen)

Attachment #669218 - Flags: review?(dbaron)

Attachment #669218 - Flags: feedback?(chris.lonnen)

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Attachment #669218 - Attachment is obsolete: true

Attachment #669218 - Flags: review?(dbaron)

Attachment #669218 - Flags: feedback?(chris.lonnen)

Robert Helmer [:rhelmer]

Assignee

Comment 8

•

13 years ago

(In reply to David Baron [:dbaron] (recovering from illness; hopefully back Oct 8, but with backlog) from comment #6) > Comment on attachment 669192 [details] [diff] [review] Thanks for the review, will fix this up. > r=dbaron with those things fixed (though I'm surprised you're still using > this code) You and me both :)

Robert Helmer [:rhelmer]

Assignee

Comment 9

•

12 years ago

We're probably going to leave the old system alone and redo this in bug 875990.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

populate postgresql with output of correlation report job

Categories

(Socorro :: Backend, task)

Tracking

(Not tracked)

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(4 obsolete files)

Description

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type