932944 - Add processed_crash keys to the API whitelist

Reporter

Description

•

11 years ago

https://crash-stats.mozilla.com/api/ProcessedCrash/?crash_id=0ecc6db0-44e1-45c7-92d1-c3e0e2131030 This is a hang report with an additional minidump. Additional minidumps should be listed from the ProcessedCrash API. There are some sub-fields which should only be available to privileged users (exploitability), but most are public. I suspect this may be because of some API blacklist? This makes dealing with hang reports impossible.

Benjamin Smedberg

Reporter

Comment 1

•

11 years ago

In this particular case, it's the upload_file_minidump_browser subtree, but it really can be any upload_file_minidump_* key.

Benjamin Smedberg

Reporter

Comment 2

•

11 years ago

Attached file log — Details

Here is the log from /var/log/messages when it disconnects. This time it took 11 minutes to timeout, instead of what I previously remember which was 3-5 minutes.

Benjamin Smedberg

Reporter

Comment 3

•

11 years ago

oops, wrong bug

K Lars Lohn [:lars] [:klohn]

Comment 4

•

11 years ago

need to change the API whitelist to include the following keys: upload_file_minidump_browser upload_file_minidump_flash1 upload_file_minidump_flash2 :bsmedberg are there others?

Summary: Non-main minidump not showing up in api/ProcessedCrash → Add processed_crash keys to the API whitelist

Benjamin Smedberg

Reporter

Comment 5

•

11 years ago

Currently no, but it would be better to just add upload_file_minidump_*. Also what does this mean for the exploitability key? Does the whitelist apply to subtrees like this?

K Lars Lohn [:lars] [:klohn]

Comment 6

•

11 years ago

Until the adoption of the json MDSW, there is no "sensitive" data stored in the processed_crash in HBase. This whitelist in the UI is a secondary system for redaction, aware only of top level keys and literals matches (no wild cards). Here in this bug, its behavior is shown to be conservative and somewhat overzealous. This is not necessarily a bad thing. A new flexible tree-aware blacklist redaction system is being deployed in the crash storage system. That system is employed by the middleware which, in turn, is employed by the UI. Asking the crash storage system for a processed crash gets in return, a redacted version of the crash. The "sensitive" branch (anywhere is the tree) is pruned along with several other keys. See Bug 930324 for details on this new system. The new redaction system also introduces a unredacted middleware API for fetching an unsensored processed_crash. This new system is not currently employed by the UI. If it is, in the future, it will be an authenticated/authorized service only.

Benjamin Smedberg

Reporter

Comment 7

•

11 years ago

(In reply to K Lars Lohn [:lars] [:klohn] from comment #6) > Until the adoption of the json MDSW, there is no "sensitive" data stored in > the processed_crash in HBase. Are you sure? I'm pretty sure that email, URL, and the "exploitability" key are all in there. This whitelist in the UI is a secondary > system for redaction, aware only of top level keys and literals matches (no > wild cards). Here in this bug, its behavior is shown to be conservative and > somewhat overzealous. This is not necessarily a bad thing. No, I just want these things added to the API whitelist.

K Lars Lohn [:lars] [:klohn]

Comment 8

•

11 years ago

> No, I just want these things added to the API whitelist. yes, this will be done > Are you sure? I'm pretty sure that email, URL, and the "exploitability" key are all in there. if they are there, then it is a bug. The HBase code for saving processed crashes currently in production contains this code fragment: def save_processed(self, processed_crash): @self._wrap_in_transaction def transaction(client, processed_crash=processed_crash): processed_crash = processed_crash.copy() # ... for k in self.config.forbidden_keys: if k in processed_crash: del processed_crash[k] with the value of ```self.config.forbidden_keys``` set to: destination.storage0.forbidden_keys: ['email', 'url', 'user_id', 'exploitability'] ad hoc exploring pulling processed crashes directly from HBase, I cannot find evidence that these keys are present. If you have a counter example, please let me know so that I may make sure the problem is corrected.

[DEACTIVATED] Adrian Gaudebert

Comment 9

•

11 years ago

Those forbidden keys are all stored in our processed crashes in elasticsearch. Perhaps that's what Benjamin was referring to. However, I didn't know those keys where not in our processed crashes in HBase. That could potentially be a problem because our current backfilling app for elasticsearch is based on those processed crashes from HBase. We could be missing a lot of sensitive data that we want to have when backfilling.

K Lars Lohn [:lars] [:klohn]

Comment 10

•

11 years ago

Attached file PR for adding multidump support to django whitelist — Details

Attachment #825856 - Flags: review?(rhelmer)

Benjamin Smedberg

Reporter

Comment 11

•

11 years ago

> with the value of ```self.config.forbidden_keys``` set to: > > destination.storage0.forbidden_keys: ['email', 'url', 'user_id', > 'exploitability'] That's surprising to me. I thought that we perhaps didn't include email/url in the hbase processed crash because they are plain duplicates of the information from the raw crash. But why are we blacklisting certain fields? In particular does this mean that hbase doesn't have the exploitability data at all? There are certainly other fields with PII, so it's not as if this blacklist is by itself a security measure.

K Lars Lohn [:lars] [:klohn]

Comment 12

•

11 years ago

> But why are we blacklisting certain fields? I've traced the blacklisting of 'email', 'url', 'user_id' via a single 'sanitize' method to May of 2009, predating our adoption of HBase. Removal of those keys from the processed crash goes further back, to at least February of 2009. The comment of that commit that resulted in the removal of that info from the processed_crash was,"we're no longer allowed to save the provided email address". I do not have records of who made the edict, but I do recall that it was considered an "emergency" issue resulting in immediate action. The 'sanitizing' of the processed crash has continued to this day, having been ported from the original old processor to the Processo2012 implementation. Aside from adding 'exploitability' to the list of 'forbidden_keys', the redaction system of Bug 930310 is the first re-visitation of that code since it was first written. Because the json MDSW puts sensitive information into the processed_crash, and we want that data retrievable, I made the decision to allow sensitive information into the processed_crash. In this latest version, redaction happens at retrieval time rather than storage time. If we need to change this system, then we should act quickly, it is to be pushed to production this week. We're poised to adopt the new json MDSW. If the redaction methods from Bug 930310 are wrong/undesirable/insufficient then json MDSW is blocked. > In particular does this mean that hbase doesn't have the exploitability data at all? That is correct, 'exploitability' is not in HBase.

Robert Kaiser

Comment 13

•

11 years ago

(In reply to K Lars Lohn [:lars] [:klohn] from comment #12) > The comment of that commit that resulted in the removal > of that info from the processed_crash was,"we're no longer allowed to save > the provided email address". I do not have records of who made the edict, > but I do recall that it was considered an "emergency" issue resulting in > immediate action. That sounds like a privacy decision to me - do we need to have a privacy review on storing this data to HBase with jMDSW?

Benjamin Smedberg

Reporter

Comment 14

•

11 years ago

With the new system from bug 930310, the exploitability data would now be in hbase? I presume that the last paragraph of comment 12 was a question for Laura? I don't remember the history in 2009 unless there were bug numbers associated with those commits, but we certainly are storing email addresses and URLs in hbase already (in the raw crash), so it's not like we're storing anything *more* now.

Flags: needinfo?(laura)

K Lars Lohn [:lars] [:klohn]

Comment 15

•

11 years ago

> With the new system from bug 930310, the exploitability data would now be in hbase? yes, in the code associated with Bug 930310, 'exploitablity' will be saved in the processed_crash within HBase. That key is removed (along with the others on the blacklist) from the return result at the time of a call to the middleware 'get_processed_crash' service. There is a PR pending that adds a new service to return an unredacted processed crash complete with _all_ the sensitive fields.

K Lars Lohn [:lars] [:klohn]

Comment 16

•

11 years ago

Of course, I've again mis-pasted a bug number. In each of my previous posts where I mention Bug 930310, I meant to have referred to Bug 930324.

Robert Helmer [:rhelmer]

Updated

•

11 years ago

Assignee: nobody → lars

Target Milestone: --- → 65

Robert Helmer [:rhelmer]

Updated

•

11 years ago

Attachment #825856 - Flags: review?(rhelmer) → review+

[github robot]

Comment 17

•

11 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/631a304e14adfe4de27eae9441572b7cdabeb6bf Merge pull request #1643 from twobraids/whitelist Fixes Bug 932944 - added multidump support to whitelist

[github robot]

Updated

•

11 years ago

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Matt Brandt [:mbrandt]

Comment 18

•

11 years ago

Bumping to verified as [qa-].

Status: RESOLVED → VERIFIED

Whiteboard: [qa-]

Laura Thomson :laura

Comment 19

•

11 years ago

Discussed redaction with lars offline.

Flags: needinfo?(laura)

Benjamin Smedberg

Reporter

Updated

•

11 years ago

Flags: needinfo?(laura)

Benjamin Smedberg

Reporter

Comment 20

•

11 years ago

This was marked FIXED, but I still don't see the other minidumps when I load the URL from comment 0, nor a newer one I just tried to load: https://crash-stats.mozilla.com/api/ProcessedCrash/?crash_id=e167ea3d-8732-4bae-a403-352e32131205

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

K Lars Lohn [:lars] [:klohn]

Comment 21

•

11 years ago

peterbe: I've verified that the underlying crash storage for HBase is delivering the correct data. I've verified the fields in question are in the white list. However, it doesn't appear that the additions to the whitelist have been honored. Do you have suggestions as to where to look next?

Flags: needinfo?(peterbe)

Peter Bengtsson [:peterbe]

Assignee

Comment 22

•

11 years ago

(In reply to K Lars Lohn [:lars] [:klohn] from comment #21) > peterbe: I've verified that the underlying crash storage for HBase is > delivering the correct data. I've verified the fields in question are in > the white list. However, it doesn't appear that the additions to the > whitelist have been honored. Do you have suggestions as to where to look > next? If it's not in https://github.com/mozilla/socorro/blob/master/webapp-django/crashstats/crashstats/models.py#L721 that would be the next thing to work on. This bug was kinda messy and some many chefs in the kitchen I ducked out of the conversation so I'm not entirely sure what's been going on.

Flags: needinfo?(peterbe)

K Lars Lohn [:lars] [:klohn]

Updated

•

11 years ago

Assignee: lars → peterbe

K Lars Lohn [:lars] [:klohn]

Comment 23

•

11 years ago

whitelist should contain: upload_file_minidump_browser upload_file_minidump_flash1 upload_file_minidump_flash2 json_dump

Benjamin Smedberg

Reporter

Comment 24

•

11 years ago

It really needs to be upload_file_minidump_<anythinghere>. If we need to special-case the whitelist for upload_file_minidump, we should do that.

Peter Bengtsson [:peterbe]

Assignee

Comment 25

•

11 years ago

"special-case"?? If, by that, you mean do support something like `upload_file_minidump_*` we'll have to do that then. At the moment, the white list filtering does not support anything smarter than a list of keys.

Peter Bengtsson [:peterbe]

Assignee

Comment 26

•

11 years ago

Is that what you mean by "special-case" (see comment above). That we need support for wildcards?

Flags: needinfo?(benjamin)

Benjamin Smedberg

Reporter

Comment 27

•

11 years ago

Yes. I'm not sure that generic wildcard support is worth it if we could hardcode this special key.

Flags: needinfo?(benjamin)

Peter Bengtsson [:peterbe]

Assignee

Comment 28

•

11 years ago

I noticed that those 4 were added to RawCrash too https://github.com/mozilla/socorro/commit/14a2b9800ec9b2b82793a86b3664c326e1fda275 I'll change that to `upload_file_minidump_*` too. Right lars?

Flags: needinfo?(lars)

Peter Bengtsson [:peterbe]

Assignee

Comment 29

•

11 years ago

Attached file Github PR — Details

[github robot]

Comment 30

•

11 years ago

Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/a59342e8f6c3663e03671322bc23dbcf3d766e4e fixes bug 932944 - Add processed_crash keys to the API whitelist, r=adrian

[github robot]

Updated

•

11 years ago

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Lonnen :lonnen

Updated

•

11 years ago

Target Milestone: 65 → 72

K Lars Lohn [:lars] [:klohn]

Updated

•

11 years ago

Flags: needinfo?(lars)

log 11 years ago Benjamin Smedberg 4.18 KB, text/plain		Details
PR for adding multidump support to django whitelist 11 years ago K Lars Lohn [:lars] [:klohn] 44 bytes, text/x-github-pull-request	rhelmer : review+	Details \| Review
Github PR 11 years ago Peter Bengtsson [:peterbe] 44 bytes, text/x-github-pull-request		Details \| Review