[SHIELD] Data Review for Context Graph Recommendation Engine Experiment

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
8 months ago

People

(Reporter: telliott, Assigned: telliott)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

2 years ago
Data being collected for this experiment is simple: URL, tab id, time accessed and dwell time. A random id is associated with this, and not linkable to telemetry or any other firefox identifier.

Telemetry is not being extended for this project. 

Data handling policy is documented at https://docs.google.com/document/d/1RSAN6SvrQm0wIOpLTYRtW5KuJPSABS3akEY0s5tFL1c/edit

The study will run for 3 months. Data analysis will be coordinated between hanno, rtilder and rweiss. Because we intend to run multiple experiments (some suggested by outside contributors), we do not have an explicit data analysis plan.

We will be collecting the data to a separate server. IDs will not be contained in the apache logs.
(Assignee)

Updated

2 years ago
Flags: needinfo?(vng)
Flags: needinfo?(rweiss)
(Assignee)

Updated

2 years ago
User Story: (updated)
(Assignee)

Updated

2 years ago
Assignee: nobody → telliott
(Assignee)

Updated

2 years ago
No longer depends on: 1294055

Comment 1

2 years ago
Are we not collecting *any* basic telemetry?  Nothing from the UT environment or system fields?  And the data is not going to the centralized Unified Telemetry system, so if we need to use Spark or something similar to analyze the data, will the data be available from an S3 or Redshift data source?
Flags: needinfo?(rweiss) → needinfo?(telliott)
(Assignee)

Comment 2

2 years ago
We are not collecting any environment data due to user-identification concerns. If there's a particular piece you think would be compelling and not too privacy-risking, Victor can probably sneak it in, but it would still get routed to our system.

We are not using any centralized systems due to access concerns. The data is getting put into a custom postgres DB that only Ryan and Hanno have access to outside of ops (which should be basically Jason). If we need to do a redshift cluster, we'll create one there and import the data.
Flags: needinfo?(telliott)
No - we're not collecting any basic telemetry on this addon.  We have the infrastructure in the addon to do so - it's supplied out of the box by the shield-addons-utils, but we don't use any of it at this point.

There's no 'sneaking' it in - we actually import all the necessary bits because of the shield utilities.
Flags: needinfo?(vng)

Comment 4

2 years ago
Okay, I don't think there is any problem with what you are collecting from a privacy perspective due to the opt-in, full consent model of SHIELD.  I will, however, suggest that you might want to collect basic telemetry as well with the addon, purely from an analysis perspective.  There are a lot of potentially useful covariates with basic telemetry that would help enrich the feature space that your addon appears to be collecting.

data-review=r+ from me (close the bug).

Updated

8 months ago
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.