Open Bug 1254570 (SHIELD-Variations-Packet-Data0Review) Opened 8 years ago Updated 7 years ago

Data Privacy Review: Shield Variations V1 Data Packet

Categories

(Shield :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: glind, Assigned: rweiss, NeedInfo)

Details

Attachments

(1 file, 2 obsolete files)

We are designing the first SHIELD-Variations Test.  This is a request for privacy and data practices review of the GENERAL "Shield-Variations" packet, to be used by this and all future experiments.

## Population.

1.  A consented / opt-in population.  Recruited (asked) from all users, all channels. 
2.  They will be shown a risks / benefits / data collected page, BEFORE installing.
3.  They WILL NOT be shown / notified again about data collection.

## What is collected:

1.  At these times:
    
    a. startup
    b. shutdown
    c. addon install
    d. addon uninstall
    e. end-of-experiment (days=<7> after start) (this can vary)

2.  Collect and send:

    a. environment
    b. telemetry clientId
    c. ADDITIONAL EXPERIMENT DATA
      
       i. experiment name
       ii. branch
       iii. 'msg' = ['startup|shutdown|install|ineligible|user-ended']

    d. Sent to Telemetry as packet type:  packet: "x-shield-trials".  

3.  Surveys:

    a. user disables experiment
    b. experiment completes.


## How Data Will be Used:

1.  Success of Experiment

    a.  by BRANCH within EXPERIMENT

        i.  total usage hours
        ii.  trimmed median usage hours
        iii. 7 day return rate.
        iv.  uninstall rate
        v.  population overall reporting, to check for massive randomness violations.
        vi.  (additional derived group centrality stats)

## Benefit to User

1.  Preventing bad code / features landing in firefox.
2.  Measuring good and bad impact before full release
3.  Seeing good features / settings earlier


## Where data will be stored

1.  Telemetry Data is stored in Unified Telemetry (at Mozilla)
2.  SurveyGizmo data is stored at SurveyGizmo.  This is covered by our data policy.  (see #751853)
3.  Some data / experiment configuration is stored on the client, via Prefs / LocalStorage.  Experiments pledge to do their best to cleanup and be minimal.

## Risks

### Privacy

1. Usual Unified Telemetry environment packets risks.
2. Connecting survey answers to telemetry data

## Addenda

If there is prior art for this process, happy to follow it.
Do you have an overview of what "shield variations" means? Is this a new name for ideatown-now-testpilot or something else?

Final data review will happen as review of the docs that you commit to your repo, but I may have some intermediate questions such as the nature of the risks/details page. ideatown-now-testpilot has a separate privacy policy for each experiment which distributes the responsibility better.
Flags: needinfo?(glind)
Attached patch bug.1254570.patch (obsolete) — Splinter Review
This is provisional, see notes.  I want to slim this and rename after we do initial study.
Flags: needinfo?(glind) → needinfo?(kparlante)
Attachment #8734789 - Flags: feedback?(benjamin)
Comment on attachment 8734789 [details] [diff] [review]
bug.1254570.patch

+Structure::
>+
>+    {
>+      type: "x-shield-studies",

No need for x-, let's just use "shield-studies".

>+      payload: {
>+        ... version of the study addon ...
>+        version:  <string>,
>+        ... study name ...
>+        name: <string>,

Is this name the same as the addon ID?

Is a particular user only ever in one shield study at a time?

>+        ... study specific branch variation enum ...
>+        variation: <string>,

We've used "branch" for this historically, can we keep the naming consistent?

>+        firstrun: <integer epoch timestamp>,

This is submitted with every ping? If so this is potentially pretty identifying/cross-referenceable. Can we reduce the granularity to the nearest hour or day and still get the effect you need?

>+        surveyUrl: <string>
>+        duration: <integer>

I don't understand what these metrics record. I need more details documented. For example I'd assume in general that the survey URL is the same for everyone in an experiment+branch. If so, why do you need to record it at all?

>+        ... string enum:
>+        msg: <string>,

You've documented this above, but I was confused and missed the connection. Can you be explicit here:

msg: "install|ineligible|running|shutdown|end-of-study|user-ended-study"

What will the running ping be used for?

>+        ... optional testing flag ...
>+        testing: <boolean>

This is for QA purposes?

Since you plan breaking changes, how are you going to version future changes? You should discuss this with gfritzsche/mreid if there's not already a plan.
Flags: needinfo?(glind)
Attachment #8734789 - Flags: feedback?(benjamin)
My understanding is that this is already in production and that mreid is familiar with it. Georg will also take a look to make sure he has a comprehensive view of the pings...
Flags: needinfo?(kparlante) → needinfo?(gfritzsche)
Component: SHIELD → General
Product: Websites → Normandy
BDS, thanks for the review.

Changes made:

To each point:
1.  OK: Name to shield-study
2.  OK: version to 1
3.  MAYBE: NAME doesn't have to be the addon name.  By convention is probably usually will be.  A user can be in more than one study, but this should be rare, and studies can prevent this.  There is an open bug for it [1]
4.  OK: variation => branch
5.  FIRSTRUN: This was used mostly for referencing which day of the study the person was in.  Now that we have good UT date collection this can drop, I think.  
6.  DROPPING surveyUrl and duration, b/c they are the same for every participant in a study
7.  OK: msg enum.  "Running" is a dup of the UT ping.  We duplicate it so that we can do basic analysis without using having to join to the table, using Shield Pings only.
8.  OK: (optional) testing is for qa / filtering purposes.

Updating the patch.

gfritsche, this look good to you?


[1]: https://github.com/gregglind/shield-studies-addon-utils/issues/16
Flags: needinfo?(glind)
I have a couple of suggestions for making field names more descriptive:

"name" -> study_name or study_id
"version" -> study_version (plain old "version" should be reserved for versioning the payload format)
"msg" -> study_state (or _status)
"testing" -> I'm unclear what this field is
Attached patch bug.1254570.v2.patch (obsolete) — Splinter Review
Amended for mreid and bds changes.
Attachment #8734789 - Attachment is obsolete: true
Comment on attachment 8772574 [details] [diff] [review]
bug.1254570.v2.patch

Looks good, but please s/msg/study_state/ in the submission circumstances at the beginning.
Attachment #8772574 - Flags: review?(mreid) → feedback+
Fixes mreid comment
Attachment #8772574 - Attachment is obsolete: true
Comment on attachment 8772955 [details] [diff] [review]
bug.1254570.v3.patch

Looks good as far as my suggestions, Benjamin do you want to re-review?
Flags: needinfo?(benjamin)
Attachment #8772955 - Flags: review?(mreid) → review+
Flags: needinfo?(benjamin)
Attachment #8772955 - Flags: feedback+
I see f+, ready to land and close?
Flags: needinfo?(mreid)
Actually I have some additional questions :)

1. Data that the ADDON wants to send... how should we handle it? I  have been considering adding 

 "source":  specific-specific|general
 "version": 2

2. For packets that come from the addon (for study-specific features), how / do we want to constrain the namespace?  Or should they get YET ANOTHER bucket / type?  

Actual example:
- site-enhance: cares about recommendations offered and given, by eTLD

(In reply to Gregg Lind (Fx Strategy and Insights - Shield - Heartbeat ) from comment #12)
> Actually I have some additional questions :)
> 
> 1. Data that the ADDON wants to send... how should we handle it? I  have
> been considering adding 
> 
>  "source":  specific-specific|general
>  "version": 2
I'm not sure what you mean here, please expand on what this data is and how it differs from the data specified previously

> 
> 2. For packets that come from the addon (for study-specific features), how /
> do we want to constrain the namespace?  Or should they get YET ANOTHER
> bucket / type?  
> 
> Actual example:
> - site-enhance: cares about recommendations offered and given, by eTLD
Wouldn't this be part of "ADDITIONAL EXPERIMENT DATA" described in the User Story?
Flags: needinfo?(mreid)
(sorry I missed this!  Missed the NI here)

Expanding:

- shield (itself) wants to send data home about it state (as described in the bug)
- addons ALSO want to send data home via telemetry.  I would like to make it easy for them to do so, by offering a REPORT function, using a simple wrapper in the shield-studies-addon-lib.  

(see:  https://github.com/mozilla/shield-studies-addon-utils/blob/master/howToShieldStudy.md#extra-measurements)

I am trying to spec out HOW the "ADDITIONAL EXPERIMENT DATA" in the user story looks.  

## By example:

Firefox-Guide cares about orientation.  it wants to phone home when orientation happens.

Should `getPings(doctype='shield-study')` get those 'additional' (orientation) pings or not?  If so what is a good way of telling the 'shield-specific' stuff from the addon-specific stuff?

Happy to real-time about this to get this done :)
I am confused about something here. I data-reviewed this doc, but it appears that it never actually landed? And the docs at https://mozilla.github.io/shield-studies-docs/ link to this bug instead of to any actual live documentation.

Where are the actual checked-in docs for this ping?
Flags: needinfo?(gfritzsche) → needinfo?(glind)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: