Closed Bug 650904 Opened 14 years ago Closed 12 years ago

pull data for correlations from pig, implement in postgres

Categories

(Socorro :: General, task)

Product:

Component:

Version:

1.7

Type:

task

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Future

People

(Reporter: laura, Assigned: rhelmer)

References

Details

(Whiteboard: [Q42011wanted] [Q12012wanted])

Laura Thomson :laura

Reporter

Description

•

14 years ago

Namely, they should use Socorro configuration files and other libraries. See bug 629029 for main details and bug 649891, bug 623164 for some of the symptoms.

Robert Helmer [:rhelmer]

Assignee

Comment 1

•

14 years ago

This script was checked into Socorro SVN in bug 629029, although we're still using the old version for 1.7.7 release. Looks like it should be pretty easy to convert this shell script to Python and have it use Socorro configuration files, logging etc. It has it's own rules for deciding which versions to use, which we could carry forward or simplify : http://code.google.com/p/socorro/source/browse/trunk/scripts/crons/cron_libraries.sh# Seems like this is intended to find the most active releases, perhaps we should just use the same table that the "branch data sources" uses.

Status: NEW → ASSIGNED

Comment 2

•

14 years ago

Hmm, isn't this a dupe of bug 642325?

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Assignee: rhelmer → chris.lonnen

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Target Milestone: 1.7.8 → 2.0

Updated

•

14 years ago

See Also: → 642325

[:jberkus] Josh Berkus

Comment 3

•

14 years ago

So, here's the table structure we want: table os_names ( os_name text not null primary key ) this is a lookup list containing only the major OS names (OSX, Linux, Windows NT and Solaris). It's there for convenience and does not change. table module_crash_counts ( module_crash_count_id integer not null crash_day date not null os_name text not null field??? text? module_signature text not null field??? text? crash_count integer not null ) This table contains the counts of crashes for that module/OS for that day. These rows are the same for both reports and are shared; as a result, attempts to insert duplicate rows will be silently ignored (trigger). The two field??? references are because we're still researching information about how the modules are currently identified in the report. table module_crash_core_counts ( module_crash_count_id integer not null FK module_crash_counts processor_type text not null cores integer not null crash_count integer not null key module_crash_count_id, processor_type, cores ) This is the table for storing the core_counts reports. table module_crash_interesting_addons ( module_crash_count_id integer not null FK module_crash_counts addon_key text not null addon_source text addon_url text addon_version text crash_count integer not null key module_crash_count_id, addon_key, addon_version ) This table populates both the interesting_addons and interesting_addons_with_versions reports. table module_crash_interesting_modules ( module_crash_count_id integer not null FK module_crash_counts module_signature text not null module_version text not null crash_count integer not null key module_crash_count_id, module_signature, module_version ); This table populates both the interesting_modules and interesting_modules_with_versions reports.

[:jberkus] Josh Berkus

Comment 4

•

14 years ago

So, questions to answer: 1) What are the two unidentified fields??? 2) Are the column names I've given above sensible? Since I don't really understand the data, I may have misidentified some of it. 3) How long do we keep this data? It is imperative that we don't add new tables to the database which never expire data.

[:jberkus] Josh Berkus

Comment 5

•

14 years ago

new (final) schema: table os_names ( os_name text not null primary key ) this is a lookup list containing only the major OS names (OSX, Linux, Windows NT and Solaris). It's there for convenience and does not change. table correlation_crash_counts ( correlation_id integer not null crash_day date not null os_name text not null signature text not null reason text not null crash_count integer not null key crash_day, os_name, signature, reason ) This table contains the counts of crashes for that module/OS for that day. These rows are the same for both reports and are shared; as a result, attempts to insert duplicate rows will be silently ignored (trigger). The two field??? references are because we're still researching information about how the modules are currently identified in the report. table correlation_core_counts ( correlation_id integer not null FK correlation_crash_counts processor_type text not null cores integer not null crash_count integer not null key correlation_id, processor_type, cores ) This is the table for storing the core_counts reports. table correlation_addons ( correlation_id integer not null FK correlation_crash_counts addon_key text not null addon_source text addon_url text addon_version text crash_count integer not null key correlation_id, addon_key, addon_version ) This table populates both the interesting_addons and interesting_addons_with_versions reports. table correlation_modules ( correlation_id integer not null FK module_crash_counts module_signature text not null module_version text not null crash_count integer not null key correlation_id, module_signature, module_version ); This table populates both the interesting_modules and interesting_modules_with_versions reports.

[:jberkus] Josh Berkus

Comment 6

•

14 years ago

feh, those are NOT final. Final ones still coming.

Updated

•

14 years ago

Target Milestone: 2.0 → 2.1

Updated

•

14 years ago

Blocks: 665605

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Target Milestone: 2.1 → 2.2

Updated

•

14 years ago

Blocks: 623164

Updated

•

14 years ago

Blocks: 642325

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Target Milestone: 2.2 → 2.3

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Target Milestone: 2.3 → 2.4

Robert Helmer [:rhelmer]

Assignee

Updated

•

14 years ago

Blocks: 628341

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Whiteboard: Q42011wanted

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Blocks: 683388

Laura Thomson :laura

Reporter

Updated

•

14 years ago

Blocks: 678373

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Assignee: chris.lonnen → josh

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Whiteboard: Q42011wanted → [Q42011wanted] [Q12012wanted]

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Target Milestone: 2.4 → 2.4.1

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

[:jberkus] Josh Berkus

Comment 9

•

13 years ago

This needs coordination and UI work as well as database work. Moving to 2.4.3.

Target Milestone: 2.4.1 → 2.4.3

Comment 10

•

13 years ago

http://dbaron.org/mozilla/topcrash-modules <-- example correlation reports, in case you can't find one in the webapp.

Comment 11

•

13 years ago

So, question for KaiRo: Correlation by OS and Architecture are slotted to be in signature summary independently. Is there value in also having a correlation report for these?

Comment 12

•

13 years ago

(In reply to Chris Lonnen :lonnen from comment #11) > Correlation by OS and Architecture are slotted to be in signature summary > independently. Is there value in also having a correlation report for these? Not really. But we don't have correlation reports for those right now either (architecture != CPU Cores). Let's bring what we have right now in the script to in-Socorro correlation reports, and then we still can look into other possibilities.

[:jberkus] Josh Berkus

Comment 13

•

13 years ago

So, plans per meeting Friday: (1) Correlations per addon: counts will be kept in a matview and populated by stored procedure. However, addon names/urls/sources are not available in the socorro system at all; the are available from the addon API service. Undecided whether we should cache them somewhere or just look them up by JS from the API. Thoughts? (2) Correlations per module: full module data is currently not available from Postgres or from the processed dumps at all as far as we can tell; what we have are only plugin modules. As a result, this data would need to be pulled from Hadoop somehow, probably a daily python script which calls a map/reduce job and then stashes the results in Postgres. (3) Correlations per architecture/core: this data is available from Postgres, but it's not clear how useful it is. See discussion with Kairo on this bug. Table schema for the matviews has been checked in.

[:jberkus] Josh Berkus

Comment 14

•

13 years ago

Kairo, Your answer above is hard to interpret. Do we need the Architecture/Core correlations, or don't we?

[:jberkus] Josh Berkus

Comment 15

•

13 years ago

> (2) Correlations per module: full module data is currently not available > from Postgres or from the processed dumps at all as far as we can tell; what > we have are only plugin modules. As a result, this data would need to be > pulled from Hadoop somehow, probably a daily python script which calls a > map/reduce job and then stashes the results in Postgres. Actually, looking at http://dbaron.org/mozilla/topcrash-modules, it appears that "modules" on the correlations report only refers to plugin modules and not all modules. Is this correct?

Comment 16

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #14) > Your answer above is hard to interpret. Do we need the Architecture/Core > correlations, or don't we? We use the the "CPU Cores" correlations quite rarely but we have done it at times. What I was saying above is that "CPU Cores" is something different from "Architecture" - one is CPOU information, the other is the basic architecture (say, x86 or amd64 or arm) that the _build_ is compiled for. So, the "CPU Cores" correlation tells us e.g. if people on multicore or single core cpus are more likely to hit the crash than the average, the "Architecture" signature summary tells us how many of those people have a 32bit, 64bit, or ARM build. So it's two completely different things. (In reply to [:jberkus] Josh Berkus from comment #15) > Actually, looking at http://dbaron.org/mozilla/topcrash-modules, it appears > that "modules" on the correlations report only refers to plugin modules and > not all modules. Is this correct? No. When looking at the correlation reports, you'll see that all kinds of stuff is in there, system DLLs, ones hooking into our process, firefox.exe, every binary with executable code that the crashing process has loaded.

Comment 17

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #13) > However, addon names/urls/sources are not available in > the socorro system at all; the are available from the addon API service. > Undecided whether we should cache them somewhere or just look them up by JS > from the API. Thoughts? From our position, both is fine. If it takes an XHR and a bit of time until the actual name appears (i.e. if it's uncached and you fetch via JS), that's fine with us. There's a number of add-on IDs we need to manually look up somewhere anyhow because they're not in AMO at all. Still, this is data that rarely ever changes (people _can_ rename an add-on on AMO, but it's really rare and probably never substantial), so caching shouldn't be too hard in the end.

Comment 18

•

13 years ago

mass migration of correlation-report related bugs that won't land today.

[:jberkus] Josh Berkus

Comment 19

•

13 years ago

This should no longer be assigned to me, since the DB infrastructure for correlations is complete (AFAIK). Assigning to lonnen.

Assignee: josh → chris.lonnen

Updated

•

13 years ago

Target Milestone: 2.4.3 → 2.4.4

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Target Milestone: 2.4.4 → 2.5.1

Updated

•

13 years ago

Target Milestone: 2.5.1 → 2.5.2

Comment 20

•

13 years ago

Kicking these out to 2.6, which roughly matches the next week I'll be working on Socorro. If someone wants to poach these to get them done earlier, that's fine with me.

Target Milestone: 2.5.2 → 2.6

Updated

•

13 years ago

Target Milestone: 4 → 6

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Assignee: chris.lonnen → peterbe

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Target Milestone: 6 → 7

Peter Bengtsson [:peterbe]

Updated

•

13 years ago

Target Milestone: 7 → 8

Peter Bengtsson [:peterbe]

Updated

•

13 years ago

Target Milestone: 8 → 10

Peter Bengtsson [:peterbe]

Updated

•

13 years ago

Target Milestone: 10 → 11

Robert Helmer [:rhelmer]

Assignee

Comment 21

•

13 years ago

I've been looking into reimplementing the Socorro UI and correlation reports are one of the big pain points right now. It's slow and doesn't seem to work on the topcrashers page as far as I can tell. I'd like to just fix it now, and avoid having to reimplement the current approach. The current method works something like this: report (topcrashers, etc) -> XHR call to Correlation_Controller -> fetch and parse a report such as [1] There is some attempt to cache, but really since we know the schedule these are uploaded to the server there's no point in letting the first user suffer. I can't see any reason this should use XHR at all, instead of just being a backend job like bugzilla, ftpscraper, etc. that populates the tables Josh describes in comment 5. So, I propose we have a backend service to fetch the reports, and also to fetch and store addon info (per comment 17 it rarely changes), and provide a middleware service to get at the data. Then the UI can simply call the service and generate HTML. Does that make sense? It'd be nice to change the summary to something more actionable if so :) peterbe, I can help with this if you need, too - lmk. [1] https://crash-analysis.mozilla.com/crash_analysis/20120516/20120516_Firefox_15.0a1-core-counts.txt

Comment 22

•

13 years ago

We should have proper DB tables or matviews that are being filled with the correlation data instead of pulling in the text files generated by the current scripts. If we have the info in proper DB tables and can get it via the middleware like any other stuff we get from the DB, that would make dealing with the UI pretty easy, I guess.

Laura Thomson :laura

Reporter

Comment 23

•

13 years ago

rhelmer: I think the goal is actually to generate these within Socorro rather than pulling them in from the text files The database schema for storage is done (see above), so the remaining parts are: - Stored proc + cron job to generate the data, daily - Middleware - UI (separate bug for this)

Robert Helmer [:rhelmer]

Assignee

Comment 24

•

13 years ago

(In reply to Laura Thomson :laura from comment #23) > rhelmer: I think the goal is actually to generate these within Socorro > rather than pulling them in from the text files Yeah understood (I thought that was bug 642325 specifically? there seems to be some overlap) > The database schema for storage is done (see above), so the remaining parts > are: > - Stored proc + cron job to generate the data, daily I suspect we could get it working now by doing the below parts first, and then not have to change them when we have generation moved directly into socorro That said, I'd be happy to help out with the above too if everyone wants to do that bit first. > - Middleware > - UI (separate bug for this)

[:jberkus] Josh Berkus

Comment 25

•

13 years ago

Talk to Lonnen about where we left this. There are already Sprocs+Matviews for two out of the 3 correlation reports. We can't currently do the "modules" correlation report because only "interesting" modules are stored in the PostgreSQL database, and that report needs to search *all* modules. Lonnen was working on getting an aggregate of the complete list of modules out of Hbase, although we could try a different approach.

Robert Helmer [:rhelmer]

Assignee

Comment 26

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #25) > Talk to Lonnen about where we left this. > > There are already Sprocs+Matviews for two out of the 3 correlation reports. > We can't currently do the "modules" correlation report because only > "interesting" modules are stored in the PostgreSQL database, and that report > needs to search *all* modules. Lonnen was working on getting an aggregate > of the complete list of modules out of Hbase, although we could try a > different approach. Socorro actually runs a mapreduce job nightly that does this, I think: https://github.com/mozilla/socorro/blob/ae282ea69ef8d663519a7628e530d27c1a02ba53/analysis/src/java/com/mozilla/socorro/hadoop/CrashReportModuleList.java Output is here: https://crash-analysis.mozilla.com/crash_analysis/modulelist/ If that is not exactly what we need I am sure we can adjust. If it does happen to be exactly what we need, it'd be easy to also insert into the DB as well as generate that file.

[:jberkus] Josh Berkus

Comment 27

•

13 years ago

Well, the problem is that we need to have a complete list of modules per crash to generate the aggregates in PostgreSQL. A canonical list of modules without a relationship to crashes doesn't do us much good. When I get back from pgcon, I'll look at the exact aggregate we need; generating it daily via a scheduled M/R job would be a fine approach.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Blocks: 749359

Peter Bengtsson [:peterbe]

Updated

•

13 years ago

Target Milestone: 11 → Future

Peter Bengtsson [:peterbe]

Updated

•

13 years ago

Assignee: peterbe → rhelmer

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Summary: Refactor correlation reports to be more Socorro-ish → move correlation reports to postgres

[:jberkus] Josh Berkus

Comment 29

•

13 years ago

So what I'd need from the aggregation job is the following grouping: Group By: product version crash reason operating system name signature module_name module_signature Operation: count of crashes Yes, this means that the same crash will be counted for multiple modules. The domain of the search would be just the last complete UTC day (usually yesterday), and (if you can make this happen) currently "active" versions only. You probably don't have active version information, though. So a sample pseudo-SQL query would be: SELECT product, version, reason, os_name, signature, module_name, module_signature, count(*) FROM reports JOIN modules JOIN product_versions WHERE date_trunc('day', date_processed) = ( current_date - 1 ) AND ( current_date -1 ) between release_date and sunset_date GROUP BY product, version, reason, os_name, signature, module_name, module_signature

Robert Helmer [:rhelmer]

Assignee

Comment 30

•

13 years ago

Xavier, please see comment 29 - I can take a shot at doing this with Pig and get review from you, but might need some help getting set up.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Summary: move correlation reports to postgres → implement correlation reports in pig, output to postgres

Robert Helmer [:rhelmer]

Assignee

Comment 31

•

13 years ago

(In reply to Robert Helmer [:rhelmer] from comment #30) > Xavier, please see comment 29 - I can take a shot at doing this with Pig and > get review from you, but might need some help getting set up. I'm set up now and working on this. (In reply to [:jberkus] Josh Berkus from comment #29) Looks doable. > The domain of the search would be just the last complete UTC day (usually > yesterday), and (if you can make this happen) currently "active" versions > only. You probably don't have active version information, though. Since this is triggered from Socorro, we could pass params to pig such as active versions, product name etc. to filter on, so this is doable. I am going to take a first stab without this though since it's a little bigger in scope (need to make the cron job that calls this smarter etc)

Robert Helmer [:rhelmer]

Assignee

Comment 32

•

13 years ago

Josh - please try this sample: http://people.mozilla.org/~rhelmer/temp/correlations-20120801.csv.gz This is a little more than you asked for - it has everything the "modulelist" report has (which has the debug_filename and also module_version, which you didn't ask for) and a little more (including count). I assume that by "module_signature" you mean what we call "debug id" in the code. Note that I am doing no sanity checking of the data, so there are missing fields - I can filter these out if you want. Also this doesn't filter by active versions as I mentioned, but that's something we could add pretty easily. If the extra info isn't a problem for you, I think we could use this info to replace the current modulelist impl, since it's a superset of that. It'd make sense to just have one nightly run instead of two, so I am inclined to keep all this in if it isn't a problem.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Summary: implement correlation reports in pig, output to postgres → pull data for correlations from pig, implement in postgres

Robert Helmer [:rhelmer]

Assignee

Comment 33

•

13 years ago

xstevens - can you please review this pull request? https://github.com/mozilla-metrics/socorro-toolbox/pull/7 Obviously based heavily off of the existing modulelist.pig, per comment 32 I think we could potentially replace that one with this one.

[:jberkus] Josh Berkus

Comment 34

•

13 years ago

Rob, Data looks pretty good, except that we're missing crash_reason.

Robert Helmer [:rhelmer]

Assignee

Comment 35

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #34) > Rob, > > Data looks pretty good, except that we're missing crash_reason. Oops, sorry about that. Here's a new version: http://people.mozilla.org/~rhelmer/temp/correlations-20120801-v2.csv.gz Updated the PR as well.

Robert Helmer [:rhelmer]

Assignee

Comment 36

•

13 years ago

(In reply to Robert Helmer [:rhelmer] from comment #35) > http://people.mozilla.org/~rhelmer/temp/correlations-20120801-v2.csv.gz BTW note that I put the header in there, the pig script doesn't do this (perhaps it should now that I think about it, it's not exactly intuitive).

[:jberkus] Josh Berkus

Comment 37

•

13 years ago

Yes, please. The header will really help us if something goes wrong with the data.

Robert Helmer [:rhelmer]

Assignee

Comment 38

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #37) > Yes, please. > > The header will really help us if something goes wrong with the data. The version of pig we're using (0.8) doesn't really have a great way to do this (there is a way in 0.9), but I can have the script that's driving the overall run put the header in for now. That means there's some danger of the header and actual columns going out-of-sync, there's not an easy way to deal with this without writing our own storage UDF (which I think is a disproportionate amount of work versus the risk, considering that we'll be able to use something more standard in the near future.

Robert Helmer [:rhelmer]

Assignee

Comment 39

•

13 years ago

Here's a very simple script for generating the report and importing to postgres: https://github.com/rhelmer/socorro/commit/418b75107d39c8e7d3690ec9cfdbb5b185aa1e94 I've landed the report itself here after review from xstevens: https://github.com/mozilla-metrics/socorro-toolbox/ This depends on some changes to replace the old java analysis stuff with submodules and pig in bug 765001 (it's branched off of that branch in fact)

Depends on: 765001

Target Milestone: Future → 19

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Blocks: 788048

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Target Milestone: 19 → 20

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Target Milestone: 20 → 21

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Target Milestone: 21 → 23

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Depends on: 792630

Robert Helmer [:rhelmer]

Assignee

Comment 40

•

13 years ago

r? https://github.com/mozilla/socorro/pull/859

Comment 41

•

13 years ago

Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/bc328ffe5f2a0eb79e2078089ae12e1fc67ef3f6 bug 650904 - start correlations job from pig, insert into postgres https://github.com/mozilla/socorro/commit/1120b11ed0a842c5cac7c87a2f7377e7a8607f6f Merge pull request #859 from rhelmer/bug650904-correlations-pig bug 650904 - start correlations job from pig, insert into postgres

Comment 42

•

13 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/ba7119c5bd81f15f3dd004a6013b756cb2e96bf5 bug 650904 - use SOCORRO_DIR, make system specify JAVA_HOME

Comment 43

•

13 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/8d4fe1f4ff5b6337faa3a75af9be0ede064c9076 bug 650904 - getmerge requires file as arg

Robert Helmer [:rhelmer]

Assignee

Comment 44

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #37) > Yes, please. > > The header will really help us if something goes wrong with the data. OK this has landed and I am testing it for inclusion into Socorro. Into which table(s) does the module info from hbase+pig go? I see a correlation_modules table, but it seems to be unused by any stored procedures. Is this intended as the matview table?

Flags: needinfo?(josh)

[:jberkus] Josh Berkus

Comment 45

•

13 years ago

Yes, it is. My original design was that you'd land the data into a raw data table (correlation_modules_raw, maybe) and then a sproc would clean it up and normalize it and put it into correlation_modules. Talk to Selena about how she wants to handle it now, though.

Flags: needinfo?(josh)

Robert Helmer [:rhelmer]

Assignee

Comment 46

•

13 years ago

(In reply to [:jberkus] Josh Berkus from comment #45) > Yes, it is. > > My original design was that you'd land the data into a raw data table > (correlation_modules_raw, maybe) and then a sproc would clean it up and > normalize it and put it into correlation_modules. Talk to Selena about how > she wants to handle it now, though. OK thanks for the info, I had thought this was further along and just waiting on the extra modulelist data to proceed. I am going to change tack for the django version - we should proceed with this but not blocking on it.

No longer blocks: 749359, 788048

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

No longer blocks: 683388

Laura Thomson :laura

Reporter

Updated

•

13 years ago

Target Milestone: 23 → Future

Robert Helmer [:rhelmer]

Assignee

Comment 47

•

12 years ago

Selena is working on a better way in bug 875990. Selena, you may find some of the history in this bug interesting if you're stuck etc.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.