Closed
Bug 889041
Opened 11 years ago
Closed 11 years ago
Socorro API: return any raw field from report/list
Categories
(Socorro Graveyard :: Middleware, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
63
People
(Reporter: benjamin, Assigned: peterbe)
References
Details
(Whiteboard: [qa+])
Related to bug 889040: the report/list API should be configurably able to return additional fields from the raw JSON.
For public users, this should be limited to the whitelisted set of raw fields in the API, bug 858245.
It would be nice for logged-in users to be able to see full raw data, but that that is a secondary goal.
Updated•11 years ago
|
Assignee: nobody → peterbe
Comment 1•11 years ago
|
||
Re-reading this bug, it actually doesn't depend on bug 901977. The report/list page doesn't use elasticsearch at the moment, all its data comes from postgresql.
I am not sure how that overlaps with supersearch, as it will be able to show any field from the processed JSON at first, and raw JSON when 889040 is done (hopefully before the end of q3). We might still want to have that ability on both pages.
No longer depends on: 901977
Updated•11 years ago
|
Target Milestone: --- → 60
Assignee | ||
Updated•11 years ago
|
Target Milestone: 60 → 61
Assignee | ||
Comment 2•11 years ago
|
||
Selena,
My objective is to extend the reports page (Reports tab on the reports list) with columns from the raw crash. http://cl.ly/RSUn
Currently that table is generated by querying socorro.external.postgresql.report:Report.get_list() which, it appears, simply does a plain select from `reports`.
Can I just add an inner join on `raw_crash` and join on uuid and then include `raw_crash` which is a JSON type?
Does that sound right? Then I can unpack that massive JSON thing when I render the template based on which columns specifically the user has chosen to see.
Flags: needinfo?(sdeckelmann)
Assignee | ||
Comment 3•11 years ago
|
||
Sorry, I think it should be a left OUTER join.
Assignee | ||
Comment 4•11 years ago
|
||
Pull request: https://github.com/mozilla/socorro/pull/1538
Comment 5•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/8bdd29f03c5aec9db37f11e6d00394a195979986
fixes bug 889041 - return any raw field from report/list, r=AdrianGaudebert
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 6•11 years ago
|
||
The new `LEFT OUTER JOIN` added to that query turned out to be MUCH too slow. Some queries took 10 minutes!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 7•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/27ada56def564db9fc430ed22c7cc72ee1a2c039
Revert "fixes bug 889041 - return any raw field from report/list, r=AdrianGaudebert"
This reverts commit 8bdd29f03c5aec9db37f11e6d00394a195979986.
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 8•11 years ago
|
||
Stay reopened darn it!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 9•11 years ago
|
||
Selena,
Can you help us figure out why the left outer join is so impressively slow?
If there is no way to make it faster we're going to have to abandon this approach entirely.
Assignee | ||
Comment 10•11 years ago
|
||
Apparently even the `select count()` causes massive seq scan on raw_crashes!!
http://pastebin.mozilla.org/3147786
Assignee | ||
Comment 11•11 years ago
|
||
So I ran that query (3 days span) on dev:
http://peterbe.pastebin.mozilla.org/3147873
That's 0.2ms to join 14 million reports with 14 million rows in raw_crashes.
On stage, the same query takes 1160ms.
http://pastebin.mozilla.org/3147829
At the time of writing, dev db is at schema revision 608afb3824d and stage is at 523b9e57eba2
Assignee | ||
Comment 12•11 years ago
|
||
OOps! I used the wrong time interval on dev. With August instead I get:
http://peterbe.pastebin.mozilla.org/3147988
97,000 rows took 3,000ms to do a count on.
Comment 13•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #11)
> So I ran that query (3 days span) on dev:
> http://peterbe.pastebin.mozilla.org/3147873
> That's 0.2ms to join 14 million reports with 14 million rows in raw_crashes.
>
> On stage, the same query takes 1160ms.
> http://pastebin.mozilla.org/3147829
Without any WHERE clause on raw_crashes.date_processed, this query seems to be a sequential scan across all raw_crashes partitions. With the WHERE clause it does an index scan.
I am not sure we can compare the contents of dev/stage/prod directly, they potentially have very different sets of data.
> At the time of writing, dev db is at schema revision 608afb3824d and stage
> is at 523b9e57eba2
I looked into this and think it's a red herring (but would be nice to fix) - latest migration doesn't look related.
Comment 14•11 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #10)
> Apparently even the `select count()` causes massive seq scan on raw_crashes!!
> http://pastebin.mozilla.org/3147786
I see this same result across dev/stage/prod. If you add a WHERE clause for raw_crashes.date_processed (equivalent date range to the one used on reports) all the seq scans go away, and index scans are used on a smaller set of partitions instead.
Reporter | ||
Comment 15•11 years ago
|
||
You'd have to check the EXPLAIN results, but you can also try joining raw_crashes and crashes on raw_crashes.date_processed = reports_clean.date_processed AND raw_crashes.uuid = reports_clean.date_processed. That may cause the query planner to jump to the single correct partition to look it up.
However, meta-question! Aren't the list of crashes in report/list generated from elasticsearch queries anyway? Why couldn't we just use the data from elasticsearch, instead of going back to postgres to get it again?
Comment 16•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #15)
> However, meta-question! Aren't the list of crashes in report/list generated
> from elasticsearch queries anyway? Why couldn't we just use the data from
> elasticsearch, instead of going back to postgres to get it again?
That is not the case, raw crash data is not even in elasticsearch at the moment. PostgreSQL has its own copy of that data. And report/list has always been using PostgreSQL, and not elasticsearch. We had planned to switch some day, but that never happened.
Reporter | ||
Comment 17•11 years ago
|
||
I don't see how it's going to be possible to fix bug 897139 properly without driving report/list from elasticsearch: how can you apply the same search limits to report/list?
Assignee | ||
Comment 18•11 years ago
|
||
Lovely new pull request: https://github.com/mozilla/socorro/pull/1572
Comment 19•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/f55cecde56355d46e4c8983c183df60b06cc4efe
fixes bug 889041 - return any raw field from report/list take 2, r=selenamarie
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Flags: needinfo?(sdeckelmann)
Assignee | ||
Comment 20•11 years ago
|
||
QA Steps to reproduce:
1. Click on Top Crashers (for any version)
2. Click on any of the signatures
3. Click the Reports tab
4. Scroll down and expect there to be lots of new available columns you can add (all with a * suffix)
5. Select some of the columns with a * and click the "Save and reload" button
Expected outcome:
* Should see the columns selected in the big table
Whiteboard: [qa?]
Updated•11 years ago
|
Target Milestone: 61 → 63
Updated•11 years ago
|
Whiteboard: [qa?] → [qa+]
Updated•8 years ago
|
Product: Socorro → Socorro Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•