I haven't been able to get real data for email field data usage by asking around and it's been time-intensive. I think it's not worth continuing to do that and instead to try to do it programmatically.
There are several points where the email address data shows up:
- the report view
- the supersearch API
- the raw crash API
In all of these cases, the user needs to be logged in and have PII permissions so we always know who's accessing the field.
The first two are in the webapp. Travis suggested hiding the email address data behind a click-wall. Clicking on the click-wall logs the event, then shows the email address to the user. We have a very limited set of users, so we could (ab)use Grafana for raw counts (who looked, when). We could create a db table and toss an entry in there and get more detailed information (who looked, which crashid, when).
The second two are in the API. With the supersearch API, the user has to explicitly include the email field in the list of columns/facets to get back. We can record an event based on whether "email" is in the requested columns/facets. Earlier, I was concerned that the Crash Stats site itself uses the supersearch API a lot, but I think recording those cases as well is the right thing to do. Do we want to log an event that the user requested the email field? Do we want to log an event for each crash id the user saw?
The raw crash API always returns the email address. Since the user isn't explicitly signaling they plan to look at the email address, I think we should record a different maybe-used event so we can differentiate.
So rough scope of work:
- half-week: figure out how we want to capture events and implement how to get the data in for recording and how to get the data back out again for analysis
- 2 days: implement the click-wall in the report view
- 1 day (may not need to do this): implement the click-wall in supersearch
- 1 day: implement event recording in the supersearch API (need to make sure using supersearch doesn't double-count)
- 1 day: implement event recording in raw crash API
My rough estimate is 1-2 weeks of work to implement email data usage tracking.