Expose telemetry client_id to about:addons and AMO remove SHIELD dependency from TAAR

VERIFIED FIXED in Firefox 65

Status

()

enhancement
P1
normal
VERIFIED FIXED
11 months ago
6 months ago

People

(Reporter: mlopatka, Assigned: mixedpuppy)

Tracking

({sec-other})

unspecified
Firefox 66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox65+ verified, firefox66+ verified)

Details

(Whiteboard: [adv-main65-])

Attachments

(4 attachments, 1 obsolete attachment)

No description provided.
In order to allow TAAR to function without a SHIELD addon to mediate the delivery of the Telemetry client_id, the
The telemetry client_id is then used to lookup other telemetry fields from a dynamoDB instance running apart from main Telemetry. These characteristics are the basis of personalized recommendations.

We need to remove the necessity for a SHIELD addon to pass the client_id.

TAAR repo: https://github.com/mozilla/taar
Does not appear to be a security bug. If this is incorrect please find us in #security or send mail to security@mozilla.org
Group: toolkit-core-security
[:ddurst] specifically requested that we mark this bug as a security issue to keep it private. We will be discussing a potential change to the security model of Firefox in the context of exposing the client_id selectively, this would mean a fundamental change tot he FX attack surface.
Group: toolkit-core-security, mozilla-employee-confidential
Sorry, dveditz, I had this on my list late last week and neglected to comment.

The proposal for this is to pass the client id as a header -- we would use this for AMO only initially, but certainly could see use cases for it wherever we use TAAR-based recommendations. That's the part that concerns me, and which I thought security would want to consider.
Flags: needinfo?(dveditz)
So this bug is a security review request, rather than a known flaw or feature work? If so please move to the firefox security review request component (or if it's AMO maybe we need ulfr's team instead)
Flags: needinfo?(dveditz) → needinfo?(ddurst)
Keywords: sec-other
OK. Sorry for the runaround. This is a review request for the proposed feature (passing the client ID via header to whitelisted sites).
Group: toolkit-core-security
Component: General → Security: Review Requests
Flags: needinfo?(ddurst)
Product: WebExtensions → Firefox
Also, I should probably loop in Marshall wrt Privacy because of the client ID (noting that this is the same client ID we use for telemetry).
Flags: needinfo?(merwin)
We've reviewed this as part of the discussions about TAAR and have approved.  We should consider this (passing the client ID via header to a whitelisted site) on a case by case basis do determine if the client ID is really needed and determine how the proposal will change the privacy risk.  But as for the TAAR point, we are good with this moving forward.  

To Martin's broader point, it certainly seems worth considering the change to the security model in light of this work.
Flags: needinfo?(merwin)
We had a follow up discussion with Martin and Travis today, and we would prefer to have Firefox send a hash of the client_id to AMO, to avoid having AMO know the actual client_id.

This would prevent AMO from being in a position to correlate IDs with telemetry, while allowing us to take a given client_id, hash it, and look it up in the AMO logs.

A simple sha256(client_id) ought to be enough here. needinfo :ddurst for his take on feasibility.
Flags: needinfo?(ddurst)
Assignee: nobody → jvehent
I don't have any issue with this.
Flags: needinfo?(ddurst)
This iteration of the patch only adds the header to the discover url, not to all AMO requests.  A couple questions.

- Why not post data or get param?
- Does this really need to be attached to all AMO requests.  Using a channel listener could do that easily, but adds some overheader, even if trivial, to all requests, where AMO requests are an insignificant portion of requests.
Adding an NI for ddurst.

Perhaps some context that may help answer the second question. We would like to have the client_id hash available on both AMO and the about:addons page, which will allow us to improve recommendation quality on AMO for Firefox clients. This also has implications for the first question. If a client is browsing AMO, then no post is occurring.
Flags: needinfo?(ddurst)
See Also: → 1499470
(In reply to mlopatka from comment #14)
> Adding an NI for ddurst.
> 
> Perhaps some context that may help answer the second question. We would like
> to have the client_id hash available on both AMO and the about:addons page,
> which will allow us to improve recommendation quality on AMO for Firefox
> clients. This also has implications for the first question. 

Already knew that.  client_id is by nature available to us in about:addons, it needs to get to amo of course.  The shield study added client_id to the discovery url, and if that's all that is necessary, then the approach is different than otherwise.  The context I'm missing is, do we need it elsewhere and is that pervasive (i.e. all amo requests), if so why, or is that for one or two specific urls.

> If a client is
> browsing AMO, then no post is occurring.

What is the use case for having client_id passed to every page when browsing?  For example, a hashed client_id could just be passed on a ping url which would set a cookie so AMO can later access that.  That way we don't have to install a listener or do any other modification.
Blocks: 1499934
:mixedpuppy The usecase we're trying to solve is : "TAAR requires the hashed telemetry ID available whenever any a.m.o page is loaded as TAAR is expected to generate suggestions for all Firefox users browsing https://addons.mozilla.org/".  

Whether or not this is implemented within a cookie or a header seems to be an implementation detail that TAAR itself isn't concerned with.  All we care about is that AMO eventually makes a request to the TAAR service that includes the hash.

I'm not familiar with ping URLs in AMO.  Can we still guarantee that we'll get the hashed telemetry ID on every page?

Will that work with cookie destroying add-ons?
Flags: needinfo?(mixedpuppy)
Had a meeting w/Scott, Kev and ddurst.  From that:

There currently are no concrete plans to do any recommendations beyond disco pane, so we will go forward with the current patch here.  If and when any plans to expand beyond disco pane solidify, we will likely change the implementation to either a header or cookie, but we will delay that conversation to such a time.

Additionally, 

- The client_id will not be sent when the user is in private browsing mode.  
- The pref, bug 1499470, will also enable/disable sending the client id.
- A couple other items are waiting decision which should happen (hopefully) monday.
Flags: needinfo?(mixedpuppy)
Sounds like a good approach to me.
Just a point of clarification, We already *are* providing recommendations beyond disco pane. Each addons detail page on AMO provides recommendations based on a model derived from addon co-installation data that is leveraging Firefox Telemetry data. The quality of those recommendations could be drastically improved with access to the hashed telemetry client_id.

If we are not moving forward with revealing the hashed client_id on any AMO page, then we will not be able to iterate on the TAAR technology. The quality of recommendations can not be expected to improve.
Let's have the discussion about header vs cookie here. A more forward-looking option is to use one of these as the vehicle for the hashed ID.

We're going to have a preference that results in the setting of a header for networks requests to AMO, or results in the setting of a cookie for AMO. In both cases there's some overhead. In the case of the cookie, there's also the edge case that a user or extension clears cookies and whether we reset that cookie on the next session start or we basically treat it as an intentional flip of the pref.

Beyond that, the question we have is whether either solution should:

a) be disabled if the user has enabled DNT (we suspect that as first-party here, we do not need to do that)
b) be a concern going forward in terms of GDPR -- this is not an issue now, but if we were to include this ID on some other mozilla property in the future, it suddenly becomes a possible tracking issue.

If neither the cookie nor header worries Legal or Privacy or Security, then it comes down to engineering choice on ease of implementation and/or performance.

If either of the options does worry Legal or Privacy or Security, then that should inform which is the less problematic approach.

I know this was sort of discussed in #c9, but worth calling it out now that the hashing, at least, is settled. And if this is something we'll just revisit when/if another (non-AMO) site is intended to be included, then so be it.
Flags: needinfo?(merwin)
(Also back to :ulfr because the model changes with that discussion.)
Flags: needinfo?(ddurst) → needinfo?(jvehent)
I'm a proponent of the cookie approach.  All the management of private browsing, ensuring secure channels, etc. are built in.  If we do a different header, we then also have to implement a number of items that cookies give us in addition to listening on all requests to watch for amo requests.

The downside of course is those users who use cookie removal extensions.  A rough thumb to the air (search cookie on amo and guestimate a total user base) shows less than 1M users, less than 1% of our user base.  That's edge case enough it shouldn't matter.  Recommendations will already need to account for a lack of clientId for users who disable it (see bug 1499470).

Security can look at the cookie version of the patch I uploaded to see if they have any concern.
No concern on my side. I like the cookie approach better too. We should still hash the ID per comment 10.
Flags: needinfo?(jvehent)
(In reply to Julien Vehent [:ulfr] from comment #23)
> No concern on my side. I like the cookie approach better too. We should
> still hash the ID per comment 10.

It is hashed.  As well, the cookies are marked secure, httpOnly, samesite-lax and non-PrivateBrowsing.

One item I'd point out is that the patch uses a pref for the list of sites that will get the cookie, currently only AMO.
This requires Data Collection Review[1]. (Since I'm involved in the technical review side, I cannot serve as Data Steward for the reivew). One piece of the review that might be very relevant is the need for public documentation of the collection (specifically including any ids being sent) and clear identification of the opt-out mechanism in that documentation and in the review.

[1]: https://wiki.mozilla.org/Firefox/Data_Collection
Attachment #9016093 - Attachment is obsolete: true
For the patch here specifically, it's well covered by automated tests.  I think any QE involvement should probably focus on the bigger picture of the FX->AMO->TAAR interaction.
Flags: qe-verify-
Submitting data collection review request form.
Flags: needinfo?(francois)
Attachment #9021859 - Flags: review?(francois)
Flags: needinfo?(merwin)
Comment on attachment 9021859 [details]
data collection review request

As :chutten mentioned in comment #25, one of the requirements for our data collection is that this be documented publicly somewhere. This could be the data review request itself if this bug were public for example.

I have read through this bug and some of the linked documents, but I haven't found anything that could qualify as public documentation for this. Did I miss something?

Also, I opened the devtools network tab while loading about:addons and didn't see any third-party script resources on that page, which is great. Do we have a policy against adding third-party-hosted scripts on there in the future?
Flags: needinfo?(francois)
Attachment #9021859 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #28)
> Comment on attachment 9021859 [details]
> data collection review request
> 
> As :chutten mentioned in comment #25, one of the requirements for our data
> collection is that this be documented publicly somewhere. This could be the
> data review request itself if this bug were public for example.
> 
> I have read through this bug and some of the linked documents, but I haven't
> found anything that could qualify as public documentation for this. Did I
> miss something?

I'm thinking this bug could be made public.

> Also, I opened the devtools network tab while loading about:addons and
> didn't see any third-party script resources on that page, which is great. Do
> we have a policy against adding third-party-hosted scripts on there in the
> future?

Not an issue.  The cookie is set in a way that about:addons nor any content contained within it would never see cookie, only the server receiving the cookie will.  The server will have to produce content based on that.
@ddurst, should we make this bug public?
Flags: needinfo?(ddurst)
I suppose so, though I expect that mlopatka should be the arbiter of where the public statement of this would live. Either here as a public bug, or elsewhere where we describe this unique hash more generally.
Flags: needinfo?(ddurst) → needinfo?(mlopatka)
I would propose that we include a detailed and transparent statement regarding the data collection in the TAAR repo.
The original motivation for making this bug private was to avoid leaking potential changes to Firefox's attack surface until after we have discussed it.

I have no problem making this bug public per-se, but I'd prefer to have a clearly articulated statement prepared and associated with the repo that we can point to from anywhere else where this discussion (or similar ones) may be had.

I'll submit a PR to the TAAR repo with data collection documentation and then circle back here.
Flags: needinfo?(mlopatka)
Public facing documentation added via PR: https://github.com/mozilla/taar/pull/131/files

Comments/PRs welcome
@ddurst, @mixedpuppy, @francois please take a look at the full `DATA_COLLECTION_POLICY.md` file under PR#131 and let me know if this is sufficiently and accurately capturing the contents of this bug.

NI: Chutten to please flag an appropriate Data Steward as francois is currently not accepting NI requests.
Renewed data steward review is requested with public facing data collection policy documentation now in the TAAR repo.
Flags: needinfo?(chutten)
Comment on attachment 9021859 [details]
data collection review request

Any Data Steward from the list[1] can be asked for review, so I choose... :liuche :)

[1]: https://wiki.mozilla.org/Firefox/Data_Collection
Flags: needinfo?(chutten)
Attachment #9021859 - Flags: review- → review?(liuche)
Comment on attachment 9021859 [details]
data collection review request

Data-review+
This request includes using a new identifier, but this has been cleared by :merwin and is covered by the privacy policy.

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes, on TAAR github: https://github.com/mozilla/taar/blob/master/DATA_COLLECTION_POLICY.md

2) Is there a control mechanism that allows the user to turn the data collection on and off? 
Yes, turning off telemetry, or turning off a pref

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**
No, 6mo collection

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Type 2 - linking client to behaviors for AMO recommendations

5) Is the data collection request for default-on or default-off?
Default on, opt-out

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

New identifier, hashed telemetry client id

7) Is the data collection covered by the existing Firefox privacy notice? **If unsure: escalate to legal if:**
This data includes a hash of a telemetry client id, but asked :merwin and collecting this has been cleared in https://bugzilla.mozilla.org/show_bug.cgi?id=1489531#c9

8) Does there need to be a check-in in the future to determine whether to renew the data? (Yes/No) (If yes, set a todo reminder or file a bug if appropriate)**
Expires in 6mo, data retention set to 6mo in https://github.com/mozilla/taar/issues/127
Attachment #9021859 - Flags: review?(liuche) → review+
Assignee: jvehent → nobody
Data collection policy is now merged in on the TAAR repo.
Can we confirm that the patch to implement this is landed and will be available for testing in Nightly?
Flags: needinfo?(ddurst)
This is not landed, it was held up by the data review.  Now that we have both, unfortunately it is also the week before beta uplift.  Do we need this in 65, or can it wait on 66?
Flags: needinfo?(mlopatka)
Strong preference for 65 if it is in any way possible.
Our development roadmap and Q4/Q1 2019 OKRs were planned around 65. We need to have the pref landed in order to do any end-to-end testing of TAAR in a realistic manner.
Flags: needinfo?(mlopatka)
ok, I went ahead and landed it.
Flags: needinfo?(ddurst)
oh, as well, I mis-read the calendar, soft freeze is next week.
Thanks! 
Once we can verify proper functioning on an end-to-end TAAR workflow I'll close out this bug.

In the mean time (as per comment#30), I'm making this bug public.

Data collection is addressed in comment 35: https://bugzilla.mozilla.org/show_bug.cgi?id=1489531#c35
Group: mozilla-employee-confidential
Pushed by ccoroiu@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/374a4bf1482e
Expose telemetry client_id hash to about:addons via cookie r=Gijs,chutten
https://hg.mozilla.org/mozilla-central/rev/374a4bf1482e
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
Assignee: nobody → mixedpuppy
Depends on: 1510700

Turns out that the cookie is not passed through (seems like it's not considered first party), so our approach for this will revert to the header (see #c19).

NI the world to see if there are any concerns about this. I assume the data review portions of this are intact, and this is just about transmission method.

Status: RESOLVED → REOPENED
Flags: needinfo?(mlopatka)
Resolution: FIXED → ---
Flags: needinfo?(merwin)
Flags: needinfo?(jvehent)

(Setting P1 because this was 65, and we'd like to preserve that if possible)

Priority: -- → P1

Turns out that the cookie is not passed through (seems like it's not considered first party), so our approach for this will revert to the header (see #c19).

Sounds good to me.

Flags: needinfo?(jvehent)
m-c patch to add header to request on about:addons

Note this will require a small change for beta uplift, specifically the webNavigation.loadURI changes to loadUriWithOptions for beta.

(In reply to Shane Caraveo (:mixedpuppy) from comment #47)

Created attachment 9036718 [details]
Bug 1489531 Expose telemetry client_id header to about:addons

m-c patch to add header to request on about:addons

I tried this patch locally and made a few changes to the AMO/Disco Pane patch too. Running the Disco Pane locally inside about:addons now works! The client ID is passed from FF to the AMO/Disco app, which then forwards this ID to the AMO API (HTTP call).

SGTM

Flags: needinfo?(mlopatka)
Beta version of client id header patch for AMO discovery.

Comment on attachment 9037262 [details]
Bug 1489531 Expose telemetry client_id header to about:addons BETA

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: Recommendations will not be available

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: No

Needs manual test from QE?: Yes

If yes, steps to reproduce: load about:addons
load network monitor
view the "Get Add-ons" panel
in network monitor, look at request headers for the main request.
you should see a Moz-Client-ID header

Two prefs are required to be true for the header to be set:
datareporting.healthreport.uploadEnabled
browser.discovery.enabled

If either are false the header should not appear. So testing should be itterated along with changing these prefs

  1. both true
  2. uploadEnabled false, discovery.enabled true
  3. uploadEnabled true, discovery.enabled false

These steps would be the same for testing on m-c

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): The change is limited to about:addons. The patch adds a header when loading the discover panel on about:addons, no other browser should be affected with this.

String changes made/needed: none

Attachment #9037262 - Flags: approval-mozilla-beta?

@RyanVM, fyi patches approved, try runs are in comments above.

Flags: needinfo?(ryanvm)
Pushed by scaraveo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4ddef07cfb17
Expose telemetry client_id header to about:addons r=aswan
Flags: needinfo?(ryanvm)
Target Milestone: Firefox 65 → ---

R+ from QA.

Tested on the following builds (using the steps from comment 54):

Verified all the combinations of Boolean datareporting.healthreport.uploadEnabled and browser.discovery.enabled configs and confirmed the "Moz-Client-Id" header appeared only when both were true.

Status: REOPENED → RESOLVED
Closed: 8 months ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 66

Spoke with merwin via email, clearing NI.

Flags: needinfo?(merwin)
Flags: qe-verify- → qe-verify+

Comment on attachment 9037262 [details]
Bug 1489531 Expose telemetry client_id header to about:addons BETA

[Triage Comment]
Fixes a header issue causing addon recommendations to not be available. Approved for 65.0b12.

Attachment #9037262 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [adv-main65-]

Marking this bug as verified based on comment #57, I've also verified using the steps provided on comment #54 and can confirm that the header only shows when both pref's listed in the above mentioned comment are set to true. Verified using Nightly 66.0a1 and 65.0b12.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Depends on: 1522810
You need to log in before you can comment on or make changes to this bug.