Closed
Bug 858245
Opened 12 years ago
Closed 12 years ago
[socorro-crashstats] Removal of personal identifiable information from public REST API
Categories
(Socorro :: Webapp, task, P1)
Socorro
Webapp
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Assigned: peterbe)
References
Details
All sensitive data should be removed from the data the API returns.
For example, fields such as "email" should simply be removed to scrubbed to blank.
Comment 1•12 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro-crashstats
https://github.com/mozilla/socorro-crashstats/commit/983cce83937ceff68f8e5780fae7b0a911daec7b
Bug 858245 - Added a scrubber app to use in the API. r=peterbe
https://github.com/mozilla/socorro-crashstats/commit/9cbc1543e5e5e21a2303cd855e36eacea4b018f3
Merge pull request #328 from AdrianGaudebert/scrubber
Bug 858245 - Added a scrubber app to use in the API.
Updated•12 years ago
|
Assignee: nobody → adrian
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Comment 2•12 years ago
|
||
1) It would be pretty valuable for logged-in/privileged people to have full access. Having a whitelist would also really help expose new fields in the standard UI more easily.
2) For everyone else, we should use a whitelist, not a blacklist. Also I'd suggest that if somebody asks for a field they aren't allowed, we should return a 403 response.
Comment 3•12 years ago
|
||
What are we returning in this API? We have a lot of fields, a whitelist seems pretty unwieldy. We don't have very many sensitive fields, and we don't add them all that often, so maybe a blacklist is okay?
Comment 4•12 years ago
|
||
I don't think a whitelist is privacy-preserving. We add new fields occasionally, and sometimes they are sensitive; I don't want to block client changes on a server blacklist. I have a list of all the *current* fields here:
http://benjamin.smedbergs.us/blog/2013-04-12/graph-of-the-day-crash-report-metadata/
Of the fields we currently have, I believe the following are sensitive:
* URL
* Email
* Android_Logcat
* PluginContentURL
But I'm also not sure about these:
* Android_Fingerprint
* Android_Display
I really don't think that a whitelist is going to be that hard to maintain.
![]() |
||
Comment 5•12 years ago
|
||
From a privacy POV, a whitelist is surely the better approach. From the POV of a developer who wants to get fast results from a speculative annotation, a blacklist is better.
This is a hard topic. It would have been good to have some kind of separation of fields from the beginning, even on the client side, so that an annotation is already marked sensitive or not when it comes to Socorro and we could just expose all that aren't marked that way (or the reverse).
![]() |
||
Comment 6•12 years ago
|
||
Oh, and Android_Display is just the display hardware name, and Android_Fingerprint is a mashup of some generic info on the used Android OS, see e.g. bp-3ce7d41f-59fc-4fa8-b864-f6ea02130507 for those fields.
BTW, why is PluginContentURL something different from URL? Socorro doesn't use the former anywhere atm, if it's good info, we might want to do something with it...
All in all, I really want to get away from having all kinds of info stuffed into App Notes and having everything that is somewhat more permanent have separate fields that we can run searches or even reports against in a good way. App Notes largely has become an excuse for not having raw JSON fields be public anywhere, and e.g. the GFX people stuff things only there because they can run scripts against the daily CSVs that way. I'd like to become cleaner there for sure.
Assignee | ||
Comment 7•12 years ago
|
||
It could be an option to do both. Eg.::
class CrashesPerAdu(SocorroMiddleware):
blacklist = (
'Email',
'URL',
)
...
class ReportList(SocorroMiddleware):
whitelist = (
'field1',
'field2',
'field3',
)
The code that applies the scrubber would simply figure out if it the model has a `blacklist` and if so, reduce. And if it has a `whitelist`, remove everything other than those listed.
Would that make things easier?
Comment 8•12 years ago
|
||
Don't do both. Let's just do a whitelist.
Assignee | ||
Comment 9•12 years ago
|
||
In conclusion, we talked about this at the Socorro meeting. Writing down a white list is the right thing to do.
Adrian, does that make sense?
Assignee | ||
Updated•12 years ago
|
Assignee: adrian → peterbe
Assignee | ||
Comment 10•12 years ago
|
||
Pull request here:
https://github.com/mozilla/socorro-crashstats/pull/364
Every model has a list complete list of all fields as whitelists. Some very few models allow all fields.
There's actually very little that gets filtered ultimately. I think there's actually only **two fields that get removed** at the time of writing.
Assignee | ||
Comment 11•12 years ago
|
||
After talking to :smedberg we concluded that the best way forward regarding the RawCrash model is to include RawCrash and whitelist all the fields that it might return. I'll show the fields to smedberg to add to the patch review.
Also, if the data isn't JSON we shouldn't return anything at all. Only if it's JSON (where it can be whitelist cleaned) it should return on the public API.
Comment 12•12 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro-crashstats
https://github.com/mozilla/socorro-crashstats/commit/cddb33b0d52024d363cb4891a1320bbd712a0ada
fixes bug 858245 - Removal of personal identifiable information from public REST API, r=rhelmer,AdrianGaudebert
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•