889041 - Socorro API: return any raw field from report/list

Reporter

Description

•

11 years ago

Related to bug 889040: the report/list API should be configurably able to return additional fields from the raw JSON. For public users, this should be limited to the whitelisted set of raw fields in the API, bug 858245. It would be nice for logged-in users to be able to see full raw data, but that that is a secondary goal.

[DEACTIVATED] Adrian Gaudebert

Updated

•

11 years ago

Depends on: 901977

Laura Thomson :laura

Updated

•

11 years ago

Assignee: nobody → peterbe

[DEACTIVATED] Adrian Gaudebert

Comment 1

•

11 years ago

Re-reading this bug, it actually doesn't depend on bug 901977. The report/list page doesn't use elasticsearch at the moment, all its data comes from postgresql. I am not sure how that overlaps with supersearch, as it will be able to show any field from the processed JSON at first, and raw JSON when 889040 is done (hopefully before the end of q3). We might still want to have that ability on both pages.

No longer depends on: 901977

Peter Bengtsson [:peterbe]

Assignee

Updated

•

11 years ago

Depends on: 912132

Laura Thomson :laura

Updated

•

11 years ago

Target Milestone: --- → 60

Peter Bengtsson [:peterbe]

Assignee

Updated

•

11 years ago

Target Milestone: 60 → 61

Peter Bengtsson [:peterbe]

Assignee

Comment 2

•

11 years ago

Selena, My objective is to extend the reports page (Reports tab on the reports list) with columns from the raw crash. http://cl.ly/RSUn Currently that table is generated by querying socorro.external.postgresql.report:Report.get_list() which, it appears, simply does a plain select from `reports`. Can I just add an inner join on `raw_crash` and join on uuid and then include `raw_crash` which is a JSON type? Does that sound right? Then I can unpack that massive JSON thing when I render the template based on which columns specifically the user has chosen to see.

Flags: needinfo?(sdeckelmann)

Peter Bengtsson [:peterbe]

Assignee

Comment 3

•

11 years ago

Sorry, I think it should be a left OUTER join.

Peter Bengtsson [:peterbe]

Assignee

Comment 4

•

11 years ago

Pull request: https://github.com/mozilla/socorro/pull/1538

[github robot]

Comment 5

•

11 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/8bdd29f03c5aec9db37f11e6d00394a195979986 fixes bug 889041 - return any raw field from report/list, r=AdrianGaudebert

[github robot]

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Peter Bengtsson [:peterbe]

Assignee

Comment 6

•

11 years ago

The new `LEFT OUTER JOIN` added to that query turned out to be MUCH too slow. Some queries took 10 minutes!

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

[github robot]

Comment 7

•

11 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/27ada56def564db9fc430ed22c7cc72ee1a2c039 Revert "fixes bug 889041 - return any raw field from report/list, r=AdrianGaudebert" This reverts commit 8bdd29f03c5aec9db37f11e6d00394a195979986.

[github robot]

Updated

•

11 years ago

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Peter Bengtsson [:peterbe]

Assignee

Comment 8

•

11 years ago

Stay reopened darn it!

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Peter Bengtsson [:peterbe]

Assignee

Comment 9

•

11 years ago

Selena, Can you help us figure out why the left outer join is so impressively slow? If there is no way to make it faster we're going to have to abandon this approach entirely.

Peter Bengtsson [:peterbe]

Assignee

Comment 10

•

11 years ago

Apparently even the `select count()` causes massive seq scan on raw_crashes!! http://pastebin.mozilla.org/3147786

Peter Bengtsson [:peterbe]

Assignee

Comment 11

•

11 years ago

So I ran that query (3 days span) on dev: http://peterbe.pastebin.mozilla.org/3147873 That's 0.2ms to join 14 million reports with 14 million rows in raw_crashes. On stage, the same query takes 1160ms. http://pastebin.mozilla.org/3147829 At the time of writing, dev db is at schema revision 608afb3824d and stage is at 523b9e57eba2

Peter Bengtsson [:peterbe]

Assignee

Comment 12

•

11 years ago

OOps! I used the wrong time interval on dev. With August instead I get: http://peterbe.pastebin.mozilla.org/3147988 97,000 rows took 3,000ms to do a count on.

Robert Helmer [:rhelmer]

Comment 13

•

11 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #11) > So I ran that query (3 days span) on dev: > http://peterbe.pastebin.mozilla.org/3147873 > That's 0.2ms to join 14 million reports with 14 million rows in raw_crashes. > > On stage, the same query takes 1160ms. > http://pastebin.mozilla.org/3147829 Without any WHERE clause on raw_crashes.date_processed, this query seems to be a sequential scan across all raw_crashes partitions. With the WHERE clause it does an index scan. I am not sure we can compare the contents of dev/stage/prod directly, they potentially have very different sets of data. > At the time of writing, dev db is at schema revision 608afb3824d and stage > is at 523b9e57eba2 I looked into this and think it's a red herring (but would be nice to fix) - latest migration doesn't look related.

Robert Helmer [:rhelmer]

Comment 14

•

11 years ago

(In reply to Peter Bengtsson [:peterbe] from comment #10) > Apparently even the `select count()` causes massive seq scan on raw_crashes!! > http://pastebin.mozilla.org/3147786 I see this same result across dev/stage/prod. If you add a WHERE clause for raw_crashes.date_processed (equivalent date range to the one used on reports) all the seq scans go away, and index scans are used on a smaller set of partitions instead.

Benjamin Smedberg

Reporter

Comment 15

•

11 years ago

You'd have to check the EXPLAIN results, but you can also try joining raw_crashes and crashes on raw_crashes.date_processed = reports_clean.date_processed AND raw_crashes.uuid = reports_clean.date_processed. That may cause the query planner to jump to the single correct partition to look it up. However, meta-question! Aren't the list of crashes in report/list generated from elasticsearch queries anyway? Why couldn't we just use the data from elasticsearch, instead of going back to postgres to get it again?

[DEACTIVATED] Adrian Gaudebert

Comment 16

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #15) > However, meta-question! Aren't the list of crashes in report/list generated > from elasticsearch queries anyway? Why couldn't we just use the data from > elasticsearch, instead of going back to postgres to get it again? That is not the case, raw crash data is not even in elasticsearch at the moment. PostgreSQL has its own copy of that data. And report/list has always been using PostgreSQL, and not elasticsearch. We had planned to switch some day, but that never happened.

Benjamin Smedberg

Reporter

Comment 17

•

11 years ago

I don't see how it's going to be possible to fix bug 897139 properly without driving report/list from elasticsearch: how can you apply the same search limits to report/list?

Peter Bengtsson [:peterbe]

Assignee

Comment 18

•

11 years ago

Lovely new pull request: https://github.com/mozilla/socorro/pull/1572

[github robot]

Comment 19

•

11 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f55cecde56355d46e4c8983c183df60b06cc4efe fixes bug 889041 - return any raw field from report/list take 2, r=selenamarie

[github robot]

Updated

•

11 years ago

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Selena Deckelmann :selenamarie :selena

Updated

•

11 years ago

Flags: needinfo?(sdeckelmann)

Peter Bengtsson [:peterbe]

Assignee

Comment 20

•

11 years ago

QA Steps to reproduce: 1. Click on Top Crashers (for any version) 2. Click on any of the signatures 3. Click the Reports tab 4. Scroll down and expect there to be lots of new available columns you can add (all with a * suffix) 5. Select some of the columns with a * and click the "Save and reload" button Expected outcome: * Should see the columns selected in the big table

Whiteboard: [qa?]

Lonnen :lonnen

Updated

•

11 years ago

Target Milestone: 61 → 63

Lonnen :lonnen

Comment 21

•

11 years ago

verified on stage

Status: RESOLVED → VERIFIED

Lonnen :lonnen

Updated

•

11 years ago

Whiteboard: [qa?] → [qa+]

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Socorro → Socorro Graveyard