Closed
Bug 932944
Opened 11 years ago
Closed 11 years ago
Add processed_crash keys to the API whitelist
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
72
People
(Reporter: benjamin, Assigned: peterbe)
Details
(Whiteboard: [qa-])
Attachments
(3 files)
https://crash-stats.mozilla.com/api/ProcessedCrash/?crash_id=0ecc6db0-44e1-45c7-92d1-c3e0e2131030
This is a hang report with an additional minidump. Additional minidumps should be listed from the ProcessedCrash API. There are some sub-fields which should only be available to privileged users (exploitability), but most are public.
I suspect this may be because of some API blacklist?
This makes dealing with hang reports impossible.
Reporter | ||
Comment 1•11 years ago
|
||
In this particular case, it's the upload_file_minidump_browser subtree, but it really can be any upload_file_minidump_* key.
Reporter | ||
Comment 2•11 years ago
|
||
Here is the log from /var/log/messages when it disconnects. This time it took 11 minutes to timeout, instead of what I previously remember which was 3-5 minutes.
Reporter | ||
Comment 3•11 years ago
|
||
oops, wrong bug
Comment 4•11 years ago
|
||
need to change the API whitelist to include the following keys:
upload_file_minidump_browser
upload_file_minidump_flash1
upload_file_minidump_flash2
:bsmedberg are there others?
Summary: Non-main minidump not showing up in api/ProcessedCrash → Add processed_crash keys to the API whitelist
Reporter | ||
Comment 5•11 years ago
|
||
Currently no, but it would be better to just add upload_file_minidump_*.
Also what does this mean for the exploitability key? Does the whitelist apply to subtrees like this?
Comment 6•11 years ago
|
||
Until the adoption of the json MDSW, there is no "sensitive" data stored in the processed_crash in HBase. This whitelist in the UI is a secondary system for redaction, aware only of top level keys and literals matches (no wild cards). Here in this bug, its behavior is shown to be conservative and somewhat overzealous. This is not necessarily a bad thing.
A new flexible tree-aware blacklist redaction system is being deployed in the crash storage system. That system is employed by the middleware which, in turn, is employed by the UI. Asking the crash storage system for a processed crash gets in return, a redacted version of the crash. The "sensitive" branch (anywhere is the tree) is pruned along with several other keys. See Bug 930324 for details on this new system.
The new redaction system also introduces a unredacted middleware API for fetching an unsensored processed_crash. This new system is not currently employed by the UI. If it is, in the future, it will be an authenticated/authorized service only.
Reporter | ||
Comment 7•11 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #6)
> Until the adoption of the json MDSW, there is no "sensitive" data stored in
> the processed_crash in HBase.
Are you sure? I'm pretty sure that email, URL, and the "exploitability" key are all in there.
This whitelist in the UI is a secondary
> system for redaction, aware only of top level keys and literals matches (no
> wild cards). Here in this bug, its behavior is shown to be conservative and
> somewhat overzealous. This is not necessarily a bad thing.
No, I just want these things added to the API whitelist.
Comment 8•11 years ago
|
||
> No, I just want these things added to the API whitelist.
yes, this will be done
> Are you sure? I'm pretty sure that email, URL, and the "exploitability" key are all in there.
if they are there, then it is a bug. The HBase code for saving processed crashes currently in production contains this code fragment:
def save_processed(self, processed_crash):
@self._wrap_in_transaction
def transaction(client, processed_crash=processed_crash):
processed_crash = processed_crash.copy()
# ...
for k in self.config.forbidden_keys:
if k in processed_crash:
del processed_crash[k]
with the value of ```self.config.forbidden_keys``` set to:
destination.storage0.forbidden_keys: ['email', 'url', 'user_id', 'exploitability']
ad hoc exploring pulling processed crashes directly from HBase, I cannot find evidence that these keys are present. If you have a counter example, please let me know so that I may make sure the problem is corrected.
Comment 9•11 years ago
|
||
Those forbidden keys are all stored in our processed crashes in elasticsearch. Perhaps that's what Benjamin was referring to.
However, I didn't know those keys where not in our processed crashes in HBase. That could potentially be a problem because our current backfilling app for elasticsearch is based on those processed crashes from HBase. We could be missing a lot of sensitive data that we want to have when backfilling.
Comment 10•11 years ago
|
||
Attachment #825856 -
Flags: review?(rhelmer)
Reporter | ||
Comment 11•11 years ago
|
||
> with the value of ```self.config.forbidden_keys``` set to:
>
> destination.storage0.forbidden_keys: ['email', 'url', 'user_id',
> 'exploitability']
That's surprising to me. I thought that we perhaps didn't include email/url in the hbase processed crash because they are plain duplicates of the information from the raw crash. But why are we blacklisting certain fields? In particular does this mean that hbase doesn't have the exploitability data at all?
There are certainly other fields with PII, so it's not as if this blacklist is by itself a security measure.
Comment 12•11 years ago
|
||
> But why are we blacklisting certain fields?
I've traced the blacklisting of 'email', 'url', 'user_id' via a single 'sanitize' method to May of 2009, predating our adoption of HBase. Removal of those keys from the processed crash goes further back, to at least February of 2009. The comment of that commit that resulted in the removal of that info from the processed_crash was,"we're no longer allowed to save the provided email address". I do not have records of who made the edict, but I do recall that it was considered an "emergency" issue resulting in immediate action.
The 'sanitizing' of the processed crash has continued to this day, having been ported from the original old processor to the Processo2012 implementation. Aside from adding 'exploitability' to the list of 'forbidden_keys', the redaction system of Bug 930310 is the first re-visitation of that code since it was first written. Because the json MDSW puts sensitive information into the processed_crash, and we want that data retrievable, I made the decision to allow sensitive information into the processed_crash. In this latest version, redaction happens at retrieval time rather than storage time.
If we need to change this system, then we should act quickly, it is to be pushed to production this week. We're poised to adopt the new json MDSW. If the redaction methods from Bug 930310 are wrong/undesirable/insufficient then json MDSW is blocked.
> In particular does this mean that hbase doesn't have the exploitability data at all?
That is correct, 'exploitability' is not in HBase.
Comment 13•11 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #12)
> The comment of that commit that resulted in the removal
> of that info from the processed_crash was,"we're no longer allowed to save
> the provided email address". I do not have records of who made the edict,
> but I do recall that it was considered an "emergency" issue resulting in
> immediate action.
That sounds like a privacy decision to me - do we need to have a privacy review on storing this data to HBase with jMDSW?
Reporter | ||
Comment 14•11 years ago
|
||
With the new system from bug 930310, the exploitability data would now be in hbase?
I presume that the last paragraph of comment 12 was a question for Laura? I don't remember the history in 2009 unless there were bug numbers associated with those commits, but we certainly are storing email addresses and URLs in hbase already (in the raw crash), so it's not like we're storing anything *more* now.
Flags: needinfo?(laura)
Comment 15•11 years ago
|
||
> With the new system from bug 930310, the exploitability data would now be in hbase?
yes, in the code associated with Bug 930310, 'exploitablity' will be saved in the processed_crash within HBase. That key is removed (along with the others on the blacklist) from the return result at the time of a call to the middleware 'get_processed_crash' service. There is a PR pending that adds a new service to return an unredacted processed crash complete with _all_ the sensitive fields.
Comment 16•11 years ago
|
||
Of course, I've again mis-pasted a bug number. In each of my previous posts where I mention Bug 930310, I meant to have referred to Bug 930324.
Updated•11 years ago
|
Assignee: nobody → lars
Target Milestone: --- → 65
Updated•11 years ago
|
Attachment #825856 -
Flags: review?(rhelmer) → review+
Comment 17•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/631a304e14adfe4de27eae9441572b7cdabeb6bf
Merge pull request #1643 from twobraids/whitelist
Fixes Bug 932944 - added multidump support to whitelist
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(laura)
Reporter | ||
Comment 20•11 years ago
|
||
This was marked FIXED, but I still don't see the other minidumps when I load the URL from comment 0, nor a newer one I just tried to load: https://crash-stats.mozilla.com/api/ProcessedCrash/?crash_id=e167ea3d-8732-4bae-a403-352e32131205
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 21•11 years ago
|
||
peterbe: I've verified that the underlying crash storage for HBase is delivering the correct data. I've verified the fields in question are in the white list. However, it doesn't appear that the additions to the whitelist have been honored. Do you have suggestions as to where to look next?
Flags: needinfo?(peterbe)
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #21)
> peterbe: I've verified that the underlying crash storage for HBase is
> delivering the correct data. I've verified the fields in question are in
> the white list. However, it doesn't appear that the additions to the
> whitelist have been honored. Do you have suggestions as to where to look
> next?
If it's not in https://github.com/mozilla/socorro/blob/master/webapp-django/crashstats/crashstats/models.py#L721 that would be the next thing to work on.
This bug was kinda messy and some many chefs in the kitchen I ducked out of the conversation so I'm not entirely sure what's been going on.
Flags: needinfo?(peterbe)
Updated•11 years ago
|
Assignee: lars → peterbe
Comment 23•11 years ago
|
||
whitelist should contain:
upload_file_minidump_browser
upload_file_minidump_flash1
upload_file_minidump_flash2
json_dump
Reporter | ||
Comment 24•11 years ago
|
||
It really needs to be upload_file_minidump_<anythinghere>. If we need to special-case the whitelist for upload_file_minidump, we should do that.
Assignee | ||
Comment 25•11 years ago
|
||
"special-case"??
If, by that, you mean do support something like `upload_file_minidump_*` we'll have to do that then.
At the moment, the white list filtering does not support anything smarter than a list of keys.
Assignee | ||
Comment 26•11 years ago
|
||
Is that what you mean by "special-case" (see comment above). That we need support for wildcards?
Flags: needinfo?(benjamin)
Reporter | ||
Comment 27•11 years ago
|
||
Yes. I'm not sure that generic wildcard support is worth it if we could hardcode this special key.
Flags: needinfo?(benjamin)
Assignee | ||
Comment 28•11 years ago
|
||
I noticed that those 4 were added to RawCrash too
https://github.com/mozilla/socorro/commit/14a2b9800ec9b2b82793a86b3664c326e1fda275
I'll change that to `upload_file_minidump_*` too. Right lars?
Flags: needinfo?(lars)
Assignee | ||
Comment 29•11 years ago
|
||
Comment 30•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/a59342e8f6c3663e03671322bc23dbcf3d766e4e
fixes bug 932944 - Add processed_crash keys to the API whitelist, r=adrian
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Target Milestone: 65 → 72
Updated•11 years ago
|
Flags: needinfo?(lars)
You need to log in
before you can comment on or make changes to this bug.
Description
•