Closed Bug 1489531 Opened 2 years ago Closed 2 years ago
Expose telemetry client
_id to about:addons and AMO remove SHIELD dependency from TAAR
46 bytes, text/x-phabricator-request
|Details | Review|
4.24 KB, text/plain
47 bytes, text/x-phabricator-request
|Details | Review|
47 bytes, text/x-phabricator-request
|Details | Review|
No description provided.
In order to allow TAAR to function without a SHIELD addon to mediate the delivery of the Telemetry client_id, the
The telemetry client_id is then used to lookup other telemetry fields from a dynamoDB instance running apart from main Telemetry. These characteristics are the basis of personalized recommendations. We need to remove the necessity for a SHIELD addon to pass the client_id. TAAR repo: https://github.com/mozilla/taar
Does not appear to be a security bug. If this is incorrect please find us in #security or send mail to firstname.lastname@example.org
[:ddurst] specifically requested that we mark this bug as a security issue to keep it private. We will be discussing a potential change to the security model of Firefox in the context of exposing the client_id selectively, this would mean a fundamental change tot he FX attack surface.
Sorry, dveditz, I had this on my list late last week and neglected to comment. The proposal for this is to pass the client id as a header -- we would use this for AMO only initially, but certainly could see use cases for it wherever we use TAAR-based recommendations. That's the part that concerns me, and which I thought security would want to consider.
So this bug is a security review request, rather than a known flaw or feature work? If so please move to the firefox security review request component (or if it's AMO maybe we need ulfr's team instead)
Flags: needinfo?(dveditz) → needinfo?(ddurst)
OK. Sorry for the runaround. This is a review request for the proposed feature (passing the client ID via header to whitelisted sites).
Component: General → Security: Review Requests
Product: WebExtensions → Firefox
Also, I should probably loop in Marshall wrt Privacy because of the client ID (noting that this is the same client ID we use for telemetry).
We've reviewed this as part of the discussions about TAAR and have approved. We should consider this (passing the client ID via header to a whitelisted site) on a case by case basis do determine if the client ID is really needed and determine how the proposal will change the privacy risk. But as for the TAAR point, we are good with this moving forward. To Martin's broader point, it certainly seems worth considering the change to the security model in light of this work.
We had a follow up discussion with Martin and Travis today, and we would prefer to have Firefox send a hash of the client_id to AMO, to avoid having AMO know the actual client_id. This would prevent AMO from being in a position to correlate IDs with telemetry, while allowing us to take a given client_id, hash it, and look it up in the AMO logs. A simple sha256(client_id) ought to be enough here. needinfo :ddurst for his take on feasibility.
I don't have any issue with this.
This iteration of the patch only adds the header to the discover url, not to all AMO requests. A couple questions. - Why not post data or get param? - Does this really need to be attached to all AMO requests. Using a channel listener could do that easily, but adds some overheader, even if trivial, to all requests, where AMO requests are an insignificant portion of requests.
Adding an NI for ddurst. Perhaps some context that may help answer the second question. We would like to have the client_id hash available on both AMO and the about:addons page, which will allow us to improve recommendation quality on AMO for Firefox clients. This also has implications for the first question. If a client is browsing AMO, then no post is occurring.
(In reply to mlopatka from comment #14) > Adding an NI for ddurst. > > Perhaps some context that may help answer the second question. We would like > to have the client_id hash available on both AMO and the about:addons page, > which will allow us to improve recommendation quality on AMO for Firefox > clients. This also has implications for the first question. Already knew that. client_id is by nature available to us in about:addons, it needs to get to amo of course. The shield study added client_id to the discovery url, and if that's all that is necessary, then the approach is different than otherwise. The context I'm missing is, do we need it elsewhere and is that pervasive (i.e. all amo requests), if so why, or is that for one or two specific urls. > If a client is > browsing AMO, then no post is occurring. What is the use case for having client_id passed to every page when browsing? For example, a hashed client_id could just be passed on a ping url which would set a cookie so AMO can later access that. That way we don't have to install a listener or do any other modification.
:mixedpuppy The usecase we're trying to solve is : "TAAR requires the hashed telemetry ID available whenever any a.m.o page is loaded as TAAR is expected to generate suggestions for all Firefox users browsing https://addons.mozilla.org/". Whether or not this is implemented within a cookie or a header seems to be an implementation detail that TAAR itself isn't concerned with. All we care about is that AMO eventually makes a request to the TAAR service that includes the hash. I'm not familiar with ping URLs in AMO. Can we still guarantee that we'll get the hashed telemetry ID on every page? Will that work with cookie destroying add-ons?
Had a meeting w/Scott, Kev and ddurst. From that: There currently are no concrete plans to do any recommendations beyond disco pane, so we will go forward with the current patch here. If and when any plans to expand beyond disco pane solidify, we will likely change the implementation to either a header or cookie, but we will delay that conversation to such a time. Additionally, - The client_id will not be sent when the user is in private browsing mode. - The pref, bug 1499470, will also enable/disable sending the client id. - A couple other items are waiting decision which should happen (hopefully) monday.
Sounds like a good approach to me. Just a point of clarification, We already *are* providing recommendations beyond disco pane. Each addons detail page on AMO provides recommendations based on a model derived from addon co-installation data that is leveraging Firefox Telemetry data. The quality of those recommendations could be drastically improved with access to the hashed telemetry client_id. If we are not moving forward with revealing the hashed client_id on any AMO page, then we will not be able to iterate on the TAAR technology. The quality of recommendations can not be expected to improve.
Let's have the discussion about header vs cookie here. A more forward-looking option is to use one of these as the vehicle for the hashed ID. We're going to have a preference that results in the setting of a header for networks requests to AMO, or results in the setting of a cookie for AMO. In both cases there's some overhead. In the case of the cookie, there's also the edge case that a user or extension clears cookies and whether we reset that cookie on the next session start or we basically treat it as an intentional flip of the pref. Beyond that, the question we have is whether either solution should: a) be disabled if the user has enabled DNT (we suspect that as first-party here, we do not need to do that) b) be a concern going forward in terms of GDPR -- this is not an issue now, but if we were to include this ID on some other mozilla property in the future, it suddenly becomes a possible tracking issue. If neither the cookie nor header worries Legal or Privacy or Security, then it comes down to engineering choice on ease of implementation and/or performance. If either of the options does worry Legal or Privacy or Security, then that should inform which is the less problematic approach. I know this was sort of discussed in #c9, but worth calling it out now that the hashing, at least, is settled. And if this is something we'll just revisit when/if another (non-AMO) site is intended to be included, then so be it.
(Also back to :ulfr because the model changes with that discussion.)
Flags: needinfo?(ddurst) → needinfo?(jvehent)
I'm a proponent of the cookie approach. All the management of private browsing, ensuring secure channels, etc. are built in. If we do a different header, we then also have to implement a number of items that cookies give us in addition to listening on all requests to watch for amo requests. The downside of course is those users who use cookie removal extensions. A rough thumb to the air (search cookie on amo and guestimate a total user base) shows less than 1M users, less than 1% of our user base. That's edge case enough it shouldn't matter. Recommendations will already need to account for a lack of clientId for users who disable it (see bug 1499470). Security can look at the cookie version of the patch I uploaded to see if they have any concern.
No concern on my side. I like the cookie approach better too. We should still hash the ID per comment 10.
(In reply to Julien Vehent [:ulfr] from comment #23) > No concern on my side. I like the cookie approach better too. We should > still hash the ID per comment 10. It is hashed. As well, the cookies are marked secure, httpOnly, samesite-lax and non-PrivateBrowsing. One item I'd point out is that the patch uses a pref for the list of sites that will get the cookie, currently only AMO.
This requires Data Collection Review. (Since I'm involved in the technical review side, I cannot serve as Data Steward for the reivew). One piece of the review that might be very relevant is the need for public documentation of the collection (specifically including any ids being sent) and clear identification of the opt-out mechanism in that documentation and in the review. : https://wiki.mozilla.org/Firefox/Data_Collection
For the patch here specifically, it's well covered by automated tests. I think any QE involvement should probably focus on the bigger picture of the FX->AMO->TAAR interaction.
Submitting data collection review request form.
Attachment #9021859 - Flags: review?(francois)
Comment on attachment 9021859 [details] data collection review request As :chutten mentioned in comment #25, one of the requirements for our data collection is that this be documented publicly somewhere. This could be the data review request itself if this bug were public for example. I have read through this bug and some of the linked documents, but I haven't found anything that could qualify as public documentation for this. Did I miss something? Also, I opened the devtools network tab while loading about:addons and didn't see any third-party script resources on that page, which is great. Do we have a policy against adding third-party-hosted scripts on there in the future?
Attachment #9021859 - Flags: review?(francois) → review-
(In reply to François Marier [:francois] from comment #28) > Comment on attachment 9021859 [details] > data collection review request > > As :chutten mentioned in comment #25, one of the requirements for our data > collection is that this be documented publicly somewhere. This could be the > data review request itself if this bug were public for example. > > I have read through this bug and some of the linked documents, but I haven't > found anything that could qualify as public documentation for this. Did I > miss something? I'm thinking this bug could be made public. > Also, I opened the devtools network tab while loading about:addons and > didn't see any third-party script resources on that page, which is great. Do > we have a policy against adding third-party-hosted scripts on there in the > future? Not an issue. The cookie is set in a way that about:addons nor any content contained within it would never see cookie, only the server receiving the cookie will. The server will have to produce content based on that.
@ddurst, should we make this bug public?
I suppose so, though I expect that mlopatka should be the arbiter of where the public statement of this would live. Either here as a public bug, or elsewhere where we describe this unique hash more generally.
Flags: needinfo?(ddurst) → needinfo?(mlopatka)
I would propose that we include a detailed and transparent statement regarding the data collection in the TAAR repo. The original motivation for making this bug private was to avoid leaking potential changes to Firefox's attack surface until after we have discussed it. I have no problem making this bug public per-se, but I'd prefer to have a clearly articulated statement prepared and associated with the repo that we can point to from anywhere else where this discussion (or similar ones) may be had. I'll submit a PR to the TAAR repo with data collection documentation and then circle back here.
Public facing documentation added via PR: https://github.com/mozilla/taar/pull/131/files Comments/PRs welcome @ddurst, @mixedpuppy, @francois please take a look at the full `DATA_COLLECTION_POLICY.md` file under PR#131 and let me know if this is sufficiently and accurately capturing the contents of this bug. NI: Chutten to please flag an appropriate Data Steward as francois is currently not accepting NI requests. Renewed data steward review is requested with public facing data collection policy documentation now in the TAAR repo.
Comment on attachment 9021859 [details] data collection review request Any Data Steward from the list can be asked for review, so I choose... :liuche :) : https://wiki.mozilla.org/Firefox/Data_Collection
Attachment #9021859 - Flags: review- → review?(liuche)
Attachment #9021859 - Flags: review?(liuche) → review+
Data collection policy is now merged in on the TAAR repo. Can we confirm that the patch to implement this is landed and will be available for testing in Nightly?
This is not landed, it was held up by the data review. Now that we have both, unfortunately it is also the week before beta uplift. Do we need this in 65, or can it wait on 66?
Strong preference for 65 if it is in any way possible. Our development roadmap and Q4/Q1 2019 OKRs were planned around 65. We need to have the pref landed in order to do any end-to-end testing of TAAR in a realistic manner.
ok, I went ahead and landed it.
oh, as well, I mis-read the calendar, soft freeze is next week.
Thanks! Once we can verify proper functioning on an end-to-end TAAR workflow I'll close out this bug. In the mean time (as per comment#30), I'm making this bug public. Data collection is addressed in comment 35: https://bugzilla.mozilla.org/show_bug.cgi?id=1489531#c35
Pushed by email@example.com: https://hg.mozilla.org/mozilla-central/rev/374a4bf1482e Expose telemetry client_id hash to about:addons via cookie r=Gijs,chutten
Assignee: nobody → mixedpuppy
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Priority: -- → P1
m-c patch to add header to request on about:addons
Beta version of client id header patch for AMO discovery.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/4ddef07cfb17 Expose telemetry client_id header to about:addons r=aswan
You need to log in before you can comment on or make changes to this bug.