Closed Bug 930324 Opened 11 years ago Closed 11 years ago

redaction in the crash storage classes

Categories

(Socorro :: Backend, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lars, Assigned: lars)

References

Details

(Whiteboard: [QA+])

Attachments

(2 files)

In Bug 930310, I suggest that we offer two processed crash services: one redacted and one not.  Rather than leave the process of redaction to the middleware, it should be implemented once in the crashstorage classes so that it is consistently applied without regard to the storage implementation. 

I suggest changing the name of the orignal 'get_processed' method throughout the crash storage heirarchy to "get_unredacted_processed".  It will give out the processed json intact.  The base class will implement a new 'get_processed' that uses 'get_unredacted_processed' and then does the key pruning. 

The keys that get removed should be configurable and, perhaps, allow wildcards. I suggest wildcards because multidump processed crashes will have a "sensitive" section for each of the processed dumps.  It would be nice to be able to specify them in one entry:  "*.sensitive"

Currently some of the individual crashstorage class implement a "forbidden_keys" parameter.  For minimal disruption, we ought to built on that.
for now, the wildcards are over-reaching.  I'll make this as quick and simple as I can and extendible in the future.
Attachment #821881 - Flags: review? → review?(rhelmer)
Attachment #821881 - Flags: review?(rhelmer) → review+
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/7490cd51e9b7316ebe5e6e4696f0678b040817bf
Merge pull request #1620 from twobraids/redacted

fixes Bug 930324 - added processed_crash redaction system
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 65
Whiteboard: [QA+]
Lars wrote a nice little script (which I read, and understand pretty well, I feel), here: http://hastebin.com/bixatoqumi.py

http://hastebin.com/ritosonigo.hs is the output, and shows as follows:

fairview:~/work/socorro7 (verify)$ python socorro/collector/submitter_app.py  --admin.conf=verify3.ini  
--source.sql="select uuid from reports_20131028 where url is not null or email is not null or exploitability is not null limit 100"

checking 5bae8f73-3384-45c3-9e75-68e2f2131028 ok
checking 688681ad-5932-403a-ad5f-a02ee2131028 ok
...etc.
Attached file breakpadVerifier.py
Test script, for posterity (sorry for the formatting woes).
Attachment #825546 - Attachment mime type: text/x-python-script → text/plain
I'm unclear:
1. Does this change the UI and API behavior with respect to revealing private data? The goal is for it to *not* change.
2. Is QA complete?
Flags: needinfo?(lars)
with the new code, if you pull a processed_crash with the web app or middleware API, you will _not_ get any of the 'sensitive' fields that we enumerated in the "forbidden_keys".  In that respect, the new code has not changed the outward behavior of Socorro in any way.  

This change has passed QA.

The change is in inward behavior.  Before, sensitive fields were not stored in the HBase processed crash.  After, the sensitive fields are stored, but the API has been modified to not show them in any manner.  The later Bug 930310 (not landed, blocked) adds a new middleware service to allow to get the processed_crash with the sensitive fields.
Flags: needinfo?(lars)
Bumping to QA verified on stage - thanks lars and stephend!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: