Closed Bug 847939 Opened 11 years ago Closed 11 years ago

add saving complete raw crash json to PG

Categories

(Socorro :: Backend, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Assigned: lars)

References

Details

(Whiteboard: [qa-])

the reports table saves a small subset of the raw crash data.  Each time the a new field is added to the raw_crash for which someone wants access, we've got to modify the schema of the reports table.  By saving the entire raw crash json, we don't have to modify the reports table.

This will necessitate getting the crashmover to write to both HBase and PG.  This change enables crashmover to actually do this.  The using the `PolyCrashStorage` class in the crashmover with both HBase and Postgres will allow this.
Is any part of this ticket resolved in bug 843788 and this merged pull request https://github.com/mozilla/socorro/pull/1096 ?
Of course, one thing that could be strange here is that we restrict access to fields with private data, and the raw JSON definitely contains private data - so we might need to restrict access to that as well - which makes it harder for someone like me to run reports against that data as the user we access the DB with doesn't have access to private data, and so we need to move fields from that raw JSON into a different, public spot again if we want us to have access...
Also, note that there can be quite huge stuff in the raw JSON, like the 200 lines of logcat on newer Android versions.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #2)
> Of course, one thing that could be strange here is that we restrict access
> to fields with private data, and the raw JSON definitely contains private
> data - so we might need to restrict access to that as well - which makes it
> harder for someone like me to run reports against that data as the user we
> access the DB with doesn't have access to private data, and so we need to
> move fields from that raw JSON into a different, public spot again if we
> want us to have access...
> Also, note that there can be quite huge stuff in the raw JSON, like the 200
> lines of logcat on newer Android versions.

Thanks for raising these issues. This is all easily addressed - and I think largely *already addressed*, for the following reasons: 

1) There is a limited amount of PII in PostgreSQL currently, primarily in the reports table and also in the email-related table (a feature that hasn't been turned on yet, afaik)

2) PostgreSQL has supported column-level permissions since version 8.4: http://www.postgresql.org/docs/current/static/sql-grant.html 

And the analyst user already operates with a limited view of the data.

3) The raw JSON will be in it's own table.

Our suspicion is that we will be creating special reporting tables with aggregate information from the primary JSON table.

Let me know if I've missed something here!
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/f621c7fa09c9f5e1e5485c2e913d9790e1c8e4e3
Merge pull request #1105 from twobraids/Bug847939-rawcrash-in-pg

Fixes Bug 847939: Raw crash in pg
Blocks: 803209
Target Milestone: --- → 38
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.